kanban

Between kanban and pair programming lies the feature brigade

Most of the topics I write about here are principles and practices that I have some real world experience with. This one is a bit more speculative. It is a very specific and practical technique, but I have not yet had the occasion to apply it. Nonetheless, I personally think it is very interesting, and I’m dying to find the opportunity to try it. Perhaps one of you…?

Read this article!

Comments (1)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Priority Filter

Planning and prioritizing is a wicked problem that has plagued humankind since time immemorial.

Suppose your next release has 100 features planned. If you ask your product planner to prioritize those features from 1 (high) - 3 (low), you’re likely to get a response like:

80 x priority 1
15 x priority 2
 5 x priority 3

Maybe your product planners are more helpful than some of the ones I’ve worked with, but the above kind of thinking sounds pretty familiar to me and reminds me of a formula:

If everything is high priority, then nothing is.

…which amounts to an abdication of responsibility for making a decision. There are a couple of ways to approach that. The brute force method, which I like to call “Developer’s Natural Authority,” is:

If you refuse to take responsibility for sequencing the work, then I will.

That’s a pretty aggressive position, but it helps to define a boundary of the problem. A somewhat more subtle approach is to repeat the question: “Of the 80 priority 1 features, please sort them into three equal-sized buckets.” In response, your planning colleague might: a) cooperate with you, b) take the hint and do the right thing, or c) become irritated with your ploy and refuse to cooperate, in which case you are back to “DNA.”

One of the problems here is that “high” or “priority 1″ don’t really mean anything unless you define them. As we’ve said here before (and will say again), plans and specifications are meaningless without operational definitions. So, we try to define what we mean by “priority 1.” Unfortunately, the usual approach to such a definition is a categorical description of priority levels: a priority 1 feature has such-and-such value and so-and-so risk. In practice, this rarely works well because it merely shifts the ambiguity out by one layer. Now we have to develop operational definitions of our otherwise subjective categories, and our little exercise is starting to look more and more complicated and expensive.

One way to respond to all of this is to apply some skills and get much more disciplined about prioritization. This is the path of methods like Quality Function Deployment or Analytic Hierarchy Process. Now, I am a real fan of these methods, so it’s tempting to just stop there and insist on bringing more to the game. But that’s not a very realistic or constructive attitude. Because while something like AHP might be appealing to a methodology geek like me, most of the people I work with are just looking for simple solutions to their immediate problems.

Effective prioritization will always define a relatively small number of high priority work items. The usual approach to this is to define some absolute criteria that separate out a few features from the rest, considered together. Then, the sequence of the work will simply be to complete all of the priority 1 features, followed by all of the priority 2 features, and then the priority 3 features, until time runs out.

But why be so absolute? The value added by prioritization is effective sequencing, and further information that is created in consequence is probably wasted effort. Rather than a monolithic absolute priority ranking, why not use something more incremental and relative? Why should we care what the 37th work item from now will be, when we only need to know what the next one is? I’m pretty sure the answer is that we don’t care. What we really want is a method that allows us to make good sequencing decisions as late as possible and for the lowest incremental cost. Which, of course, sounds like a call for pull and options thinking. The approach I’ve been using lately is a kanban-like method I call “Progressive Priority Filter” or “Priority Sieve.”

Here I have drawn a task planning board with five columns. The columns are labeled backlog, pri 3, pri 2, pri 1, and done. The three priority columns contain work item tickets, and each column has a work-in-process limit. The limit decreases by priority. We define priority according to situational capacity and availability, rather than by some absolute product criteria.

Our priority definitions are something like:

  1. An item which we are currently working on or intend to work on immediately, strictly limited by our currently available capacity
  2. An item which we should work on as soon as possible, but for which we do not have immediate capacity
  3. An item which we should work on soon, but is not immediately pressing

The limit for priority 1 is strictly defined by current capacity, which is how much work can be done today, or a similarly convenient minimum planning interval. If you have a work capacity of 2, then there are always 2 priority 1 work items. Regardless of how much work has been done or how much work remains, at all times exactly 2 work items have priority 1 with respect to the existing backlog.

The limits for priorities 2 and 3 should be defined by some increasing sequence, such as geometric (e.g. 2,4,8) or Fibonacci (e.g. 3,5,8). Such a sequence should match the uncertainty of the decision. Pri 3 tickets are much less certain than pri 1 tickets, so we keep more of them open as “options”. Such an increasing sequence also means that tickets spend more time in the lower priority states than the higher ones, befitting their uncertainty.

The priority buffers are followed by a larger backlog of items defined as items which we believe we should do, but do not yet have priority. If a new item appears in the backlog that is obviously more important than something on the board, then that new item may replace an item on the board, which is returned to the backlog. This can be done for any item that is not already in process. The backlog is only limited by space on the board, and backlog items are written directly on the board, without a ticket. A backlog item is only considered serious enough to merit a ticket when it is promoted to priority 3.

Naturally, any time a work item is completed, it is moved to the “done” column:

Completing a work item makes capacity available, and triggers a process for making a priority decision:

Only three items need to be selected per decision, and only one selection constitutes a commitment (pri2->pri1). Each item will be considered at least three times before being promoted from backlog to priority 1, and there is plenty of opportunity to demote it before it makes it that far. The cost of making each scheduling decision is low, and the probability of committing to a poor decision is also low. There is ample opportunity for the team to review and modify the current priorities, as the only negotiated tickets are the small number of priority 1’s, which should only be selected with the consent of the task’s owner.

One more element is needed to make this viable as a prioritization scheme. It might be possible for a ticket to languish in the lower priority states without ever being selected for promotion. To prevent this, items should be assigned a creation date. When tasks in one priority bucket are compared in consideration for promotion, aging tickets should be given preferential consideration. Old tickets deserve either immediate promotion or reconsideration, but should not be allowed to languish in a buffer indefinitely.

I like this system because it seems to strike a good balance between overhead and outcome. Like AHP, it breaks the problem into small chunks that are easily considered. Unlike AHP, it defers decisions until they are actually required, and when more information is available. Thus, it’s also a bit like Rolling Wave Planning, in the small. Each decision is limited in scope, and only a portion of each decision makes a commitment. It minimizes the penalty of making suboptimal choices. It broadcasts the current state of understanding and delegates responsibility to the team with minimum overhead. These are all features that I want in a prioritizing process, so if there is a better scheme than this one, it will still have to meet this bar.

Comments (4)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Modeling kanban systems as Petri nets

You might think there’s only so much one could say about moving sticky notes around on a whiteboard, but no, there’s still quite a bit left to cover on that topic. One of the nice things about computer science / informatics is that it provides us with such wonderful tools for describing the behavior of complex systems, especially for event-driven systems like our kanban-regulated software development workflows. If we have a lot to say about this subject, it might help to say some of it in more precise language.

This Petri net diagram describes a basic module of the type of kanban system that we’ve been using to manage software development processes. This is a simple pending->working->complete workflow, but this module can be chained together to add additional steps.

The example has a WIP limit of 3. If all 3 kanban tokens are in the busy state, then no tokens will be available, so the work item transition from pending to in-process cannot fire. As soon as an in-process work item completes, then a busy kanban token will return to the available state, and enable the transition of a new work item out of the pending state.

A curious thing about our model here is that it does not impose any queueing order. A manufacturing system might have a FIFO queueing rule, and therefore require some additional model detail. Our system has no such limitation, so our simple little model is pretty representative just like it is.

We can describe additional features of real-life workflows by using the Color and Hierarchy extensions to Petri nets. There are also other variations of the kanban signaling mechanism that we might tinker with. But we should be able to describe most anything we can think of, and I’ll revisit this topic in a coming article.

Comments (5)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Scrum-ban

As more people become interested in Lean ideas and their application to knowledge work and project management, it’s helpful to find ways that make it easier to get started or learn a few basic concepts that can lead to deeper insights later. For those that are curious about kanban in an office context, it’s not unusual to find people who are either currently using Scrum, or have some understanding of Scrum as representative of Agile thinking. One way or another, Scrum users are an important constituent of the Kanban audience. Since Scrum can be described as a statement in the language we use to describe kanban systems, it is also fairly easy to elaborate on that case in order to describe Scrum/Kanban hybrids…

read this paper…

Comments (0)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Completion queue as incremental throttle

In the last two posts, we’ve discussed some useful properties of internal workflow queues:

  • queue states between processes can provide an early warning of process breakdowns
  • local work-in-process limits serve to slow down a malfunctioning workflow and free up resources to fix it
  • queues can sometimes be combined to reduce the total work-in-process while still preserving their buffering function

I gave an example of workflow throttling, and suggested there was another configuration of those internal queues that could respond more smoothly and gracefully than the simple, independent queues given in the example.

In order to pull a work item, there has to be a place to pull it from, and there should be some way to distinguish work that is eligible to be pulled from work that is still in process. At the same time, there has to be a place to put completed work when you are done with it. A completion queue serves both these functions.

In this case, we can have up to 3 items in the “specify” state AND we can have up to 3 items waiting for the next state in the workflow. The team can pull new work into “specify” whenever there are fewer than 3 work items in process. If there are already 3 work items in process then the team will have to wait until something is moved into the completion queue. If there is some kind of blockage downstream, first the completion queue will fill up, THEN the specify queue will fill up, THEN the specify process will stall. And when it stalls, it stalls all at once. The flow is either on or off, there’s no middle speed, and it keeps going until it stalls.

In another example, we still have a busy state and a complete state, but the token limit is shared between them. In this case, we can have 4 items in process OR 4 waiting. Or we can have (3 busy + 1 waiting) OR (1 busy + 3 waiting).

In the ideal case of 3 busy and 1 waiting, this queue works just like the first example does. However, if work starts to accumulate in the “complete” state, then the “specify” state will incrementally throttle down. The effective WIP limit for “specify” goes from 4->3->2->1->0 as more items are completed ahead of the rate of downstream intake. So, the process slows before it stops, and it slows much sooner than it would have under the independent queues.

What’s more, even though it operates in the same way in the normal case, it does it with two fewer kanban in the system. Fewer kanban, with gradual throttling and smoother flow, should result in lower lead times.

With this in mind, let’s reconsider our scenario from the previous topic:

1. Something is going wrong in the design process, but nobody knows it yet.
2. The specify-complete queue starts to back up, thereby throttling down the WIP limit for specify. A resource is freed as a result, who should now inquire into the cause of the backup, which may only be random variation. The code process continues to complete work and pull from the existing backlog.
3. Code state begins to starve and specify state throttles down another level. Two more people are released as a result. There’s more than enough free resources now to either fix the problem or shut down the process.
4. The stall completes by flushing out the specify and code states.

It still takes a while for the system to stall completely. The difference is that it begins stalling immediately, and when it does stall, it stalls with less WIP. For equivalent throughput, this pipeline should operate with fewer kanban and less variation in WIP, and therefore should have smoother flow and shorter lead times. It should respond faster to problems and free up resources earlier to correct those problems.

These shared completion queues might be the most common type of workflow queue. There are a couple of other types that we use, and we’ll take a look at those in a future post.

Comments (0)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Close
E-mail It
Socialized through Gregarious 42