kanban

Queue utilization is a leading indicator

I talk a lot about how to apply Lean ideas to software development. Perhaps I sometimes take it for granted that we understand why we should apply them. Mary Poppendieck has already written quite a bit on that rationale, and I try not to rehash things I think she’s already covered adequately. I do think there are a few characteristic scenarios where Lean principles most clearly apply to software development:

  • Any kind of live network service, whether customer-facing (Google.com, Amazon.com) or machine-facing (Bigtable, SimpleDB)
  • Any kind of sustaining engineering process: bug fixing, security patching, incremental enhancement
  • Evolutionary product design (which is to say, effective product design)

That said, there is a very pragmatic reason to adopt a Lean workflow strategy, regardless of what sort of product you are building: Lean scheduling provides crystal clear leading indicators of process health.

I am speaking of kanban limits and andon lights.

Work in process is a leading indicator


For a stable workflow, lead time is a function of both throughput (how much stuff we complete every day) and work-in-process. For a given rate of throughput (with everybody busy at their jobs), an increase in WIP necessarily means an increase in lead time.

It’s simple cause and effect: an increase in WIP today will mean an increase in the time to deliver that work in the future. As far as leading indicators go, this one’s rock solid. You can’t do more work than you have the capacity to do work, without taking longer to do it.

A simple management technique is to simplify the problem with policy. If lead time is a function of both throughput and WIP, and you can hold WIP near constant by an act of policy, then you can begin to address the more difficult problem of throughput. WIP is relatively easy to control, because somebody in your business should have the power to approve or deny starting on a new work order. Throttling work orders is a much easier problem than learning how to work faster.

This is effectively the result of a Drum-Buffer-Rope system, or its Lean cousin, a kanban system. Only after you get the simpler variable under control can you begin to make consistent progress on the more difficult one.

If we have a well-defined workflow, then the total work-in-process is the sum of the WIP of all of the parts of that workflow. Limiting the total WIP in the system can still mean quite a bit of variation in the distribution of WIP between the parts of the system. Our next step after limiting total WIP will be managing that component WIP more closely, and it turns out that some parts of that component WIP are more sensitive predictors of lead time than others.

Which is to say, that given the same root cause, some inter-process workflow queue will go from 2 to 4 long before the global WIP would go from 20 to 40 if it were unregulated. If you set your system up right, one or more of those internal queues will telegraph problems well before they manifest elsewhere.

Development workflows need buffers


The irregularity of requirements and the creative, knowledge-intensive nature of a design activity like software development rules out clocked workflow synchronization. Sometimes the interface to something will be simple, but the algorithm behind it will not. Sometimes the opposite is true. Sometimes an apparently simple design change has wide-reaching effects that require reverification and a lot of thinking about edge cases. Risk and uncertainty are built into the nature of development work. Novelty is what gives software its value, so you can only get so far in reducing this kind of variation before you have to mitigate and adapt to it. Abandoning takt time for development work has been our big concession to the messy reality, although we still look for opportunities to introduce a regular cadence at a higher scale of integration. Of course, we’d be delighted and astounded to hear of anybody making a takt time concept work.

Instead, we have to use small inventory buffers between value-adding processes in order to absorb variation in the duration of each activity across work items. We allocate kanban to those buffers just like anywhere else, and those kanban count towards our total allocation. Making the buffers random-access makes them even more flexible in absorbing process variation.

What is this inventory? Specifications that have not been implemented. Designs that have not been reviewed. Code that has not been tested and deployed. You can measure things like “weeks of specs-on-hand” and “percentage of specs complete.” The higher that first number is, the lower the second one probably is. For orgs that carry months worth of specs at a time, that second number can quickly converge on zero. So don’t do that! If you’re carrying more than a few weeks worth of detailed specifications at a time, ask yourself….why? What are you going to do with them? Specification inventory is a liability just like any other kind of inventory.

So we’re carrying a few hours or days worth of inventory at a time, because it’s still faster than the alternatives of generalist labor or pipeline congestion. And to be clear, when I’m talking about carrying kanban inventory, I’m talking about hours or days, not weeks or months. And I like hours a whole lot better than days.

The joy of complementary side effects


Agile development has long rallied around the “inspect and adapt” paradigm of process improvement. It is a philosophy that it shares with its Lean cousin. But early Agile methods built their model of feedback around the notion of velocity, and velocity is a trailing indicator. Velocity, and even lead time, can only tell you about things that have already happened.

To be fair, all Agile methods include higher-frequency feedback in the form of the daily standup. But a qualitative assessment is not the same as a quantitative indicator. Done well, the right measure can tell you things that people in a conversational meeting either can’t see, or won’t admit to. An informal, qualitative, Scrum style of issue management leads to confusion between circumstantial vs systemic problems, and the obstacle-clearing function of the Scrum Master often leads to one of Deming’s “two mistakes”. But then, Deming might have taken exception to a number of beliefs and practices common to today’s Agile practitioner. That’s okay, we Planned and we Did, and now we are Studying and Acting.

The regulating power of the in-process inventory limit is that it tells you about problems in your process while you are experiencing the problem. You don’t have to extract a belated confession from a stubborn problem-solver or wait for the end of the month to have a review in order to notice that something went wrong. You watch it going wrong in front of your eyes as it happens.

In a kanban workflow system, inter-process queues start backing up immediately following any blockage in their downstream processes. If your team is all working within a line of sight of a visual control representation of that inventory, then you all see the problem together as it manifests. A backed-up queue is not a matter of opinion and the consequences are highly predictable.

Making the indicator work for us


If we’re using a kanban system, we have the WIP limit indicator at our disposal. How can we use this to our advantage?

Under normal conditions of smooth flow, the kanban queues should be operating below their limits. Which is to say, the system has some slack. Slack is good, and optimum flow means “just enough slack.” The limits for the queues are set according to a different rule than the limits for value-added states. Buffer states are non-value-added processing time, so we want to make them as small as we can. The queues are there for the purpose of smooth flow. Make them too big, and they just increase inventory and lead time. Make them too small and they cause traffic jams…which also increases lead time. So there’s a “just right” size for kanban queues, and that is as small as possible without stalling X% of the time. Since the queue size is a tradeoff, there is an optimal value for X which is less than 100. The difference between X and 100 is your expectation of process improvement which will be triggered by the occasional stall event. So our process has slack, but our slack doesn’t. When we run out of slack, we want to stop what we’re doing and try to learn how to operate with less slack in the future.

A healthy state of affairs. A lot of working, not much waiting. When the next analysis task is done, there will be room to store the result, even if design is busy. Design is not under any particular pressure to complete something…yet. But conditions can change quickly, so no excuse to dawdle!

Since our system is a pull system, our process breaks down in a characteristic way. When a queue fills up, there’s nowhere for the output of the process before it to go, so that process will begin to back up itself, and so on, until the entire pipeline in front of the jam eventually stops while the remainder of the pipeline flushes itself out. Good! That’s what we want. Every process in the system serves as a throttle for its predecessor. That means that the system as a whole is regulated by the health of its parts. Shortly after any part of the system starts to go wrong, the entire system responds by slowing down and freeing up resources to fix the problem. That automatic reflection of process health is a powerful mechanism for continuous improvement.

Let’s walk through a typical failure mode:

1. Something is going wrong in the design process, but nobody knows it yet. The senior devs are all sick with the flu. Nobody signals the andon light because they’re at home, or they have other problems on their minds.
2. The analysts, who are in a different hallway, seem immune and continue to complete their assignments. At this point, the process is already signaling that something is amiss.
3. The analysts start up their next tasks anyway. The pipeline to the right of design continues on processing from its own queue.
4. There’s nowhere for the analysts to put their completed work, so now they are also stalled. The right side of the pipeline has flushed out whatever work was already in process and now they are idle as well. The ready queue has backed up, and so the whole pipeline is now stalled.

With no intervention other than enforcing the kanban allocation, the system spontaneously responds to problems by shutting itself down. This would be an example of jidoka applied to our development workflow. The people who are idled by this process can and should spend their time looking into the root cause of the problem, either to mitigate it (if it is a special cause) or to prevent it from happening in the future (if it is a common cause). You can’t really predict when the design team will get sick, so in this case, perhaps the analysts and junior devs can work together and complete some of the design tasks until the missing devs get back to health. In this case, it may be an opportunity to discover if the team is sufficiently cross-trained to cover the gap and ask questions about roles and responsibilities.

Even though the problem is self-limiting by slide 4, we already know in slide 2 that slides 3 and 4 are likely to happen if we don’t intervene. It would have been better if somebody had taken greater notice of the signal in slide 2 and began an investigation. It would also be nice if the system itself could respond both more quickly and more gracefully than in this example.

In the next article, we’ll look at another queueing method that will allow us to simultaneously reduce lead times, smooth out flow, and respond more quickly and gracefully to disruptions.

Comments (9)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Pool queue

Manufacturing systems have workflows and knowledge work systems have workflows (and little lambs eat ivy). There are principles that apply to workflows in general, regardless of whether they operate on bits or atoms, and that accounts for much of what we discuss here at Lean Software Engineering. There are also things that are completely different about information workflows. One of those things is the physical space necessary to operate the system. The nature of information space is fundamentally different from any physical process.

Fortunately for us, that often works to our advantage. It means we can manipulate our workflows and work products in ways that would be nonsensical to a traditional industrial engineer. Since most of the literature about Lean is still about moving atoms around, you have to pinch yourself every now and then as a reminder that moving bits around involves a different set of rules.

Bits or atoms, the notion of an inter-process inventory buffer is generally important to our scheduling methodology. Our overall goal is to minimize lead times for new work requests, and a great part of how we do that is by managing our in-process inventory very carefully. But an information inventory is different from a manufacturing inventory, in that it doesn’t occupy exclusive space in a meaningful way. Our information WIP might go into a virtual queue, effectively infinite in size, with no definite order for queuing or dequeuing, and no conflict between objects in the queue. A virtual queue can be random-in-random-out in a way that’s improbable for more spatially-oriented storage.

An issue that seems to come up regularly for development teams is how to distribute multiple work product types across the team’s resources. One approach says dedicate resources to each product type, say, a couple of “feature teams” and some bug fixers. Or a “front end” team and a “back end” team. Another approach says make a prioritization rule and assign all of the work to the common team. A kanban system enables us to use a hybrid approach that dedicates capacity to each work product type, without actually dedicating people.

Suppose we have a fairly simple, generic, 2-stage development process, common to all work product types:

Because it’s knowledge work, there’s too much variation between the two subprocesses to synchronize according to a clock interval, so we make an inter-process queue to absorb the variation:

The queue just holds the kanban, the actual inventory is still sitting in the same document, database, or code repository that it was in when somebody was working on it. It doesn’t matter where the real inventory is because nobody is competing for the storage.

Then we scale that process according to the available resources and demand:

But we can hybridize even further by exploiting some of our “virtual space” advantage. Because our “workcells” and “buffer stocks” don’t actually occupy any spatially constrained floor space, we can arrange them in any logical arrangement that suits us. In this case, we’re going to make a single pooled buffer that straddles both production lines:

Why would we do that? Pooling the variation across the queues for both lines allows us to reduce the total number of kanban in the system, and thereby reduce the lead time for the system as a whole. The dedicated queues each needed a minimum capacity of 2, for a total of 4, to avoid stalling. The combined queue only needs 3 to avoid stalling, because it is rare that both independent queues are simultaneously at their limit of 2. We can reduce the queue further by improving the variability of either of the surrounding processes. Again, it will be easier to reduce from 3 to 2 than it would be to reduce from 2 to 1.

Comments (0)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Coffee cup kanban

Coffee bars employ a couple of different strategies for taking and filling orders. Each strategy makes different tradeoffs.

Sometimes someone will take your order, ring you up, and then make your drink and give it to you. Other times someone will take your order, mark up a cup with the details of your order, place the cup in a queue to be picked up by a barista who will make the drink and then place it on a shelf and call it out.

That second arrangement is a kanban system, and the cup is the kanban. The cup-ban doubles as an order form that can encode most combinations that a barista should expect.

There are reasons to choose one process over another. The first method is usually applied by small or lower-volume shops with only one employee on shift. The second method is usually applied in larger, higher-volume stores with two or more workers on shift. An advantage to the store of using the kanban method is that they can take your order–and collect your money–quickly. Unfortunately for you, that often means exiting one queue so you can line up in another, more captive queue.

It’s good news for you when the barista asks you, “can I get a drink started for you?” because that should mean he has slack capacity. By the time the cashier finishes collecting your money, the drink should already be under construction. The barista shouldn’t ask you that if he already has a queue of cups to process. On the contrary, once the kanban queue starts backing up, the cashier should start stalling, even if that appears to make the cashier queue back up. The second queue has limited capacity before waiting customers start crowding each other or irritating seated customers.

Some shops get obnoxiously long lines during the morning rush. The solution to that is usually adding an additional espresso machine. The complicating factor is that the rushes don’t last, and then the surplus capacity goes unutilized for most of the day. Still, I know for certain that some shops lose sales, and even customers because of the lines, so I don’t think that strategy is employed as often as it should be.

Watch your coffee cup once your order is taken. It should never stop flowing. If it does, you should ask yourself why and imagine what you might do differently.

Comments (9)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Shigeo Shingo on kanban limits

“A gradual decrease in the number of kanban leads to decreases in stock, which terminates the role of stock as a cushion against production instability. This highlights undercapacity processes and those generating abnormalities and simplifies discovery of the major points needing improvement. Overall efficiency is increased by concentrating on the weakest elements.” — Shigeo Shingo, A Study of the Toyota Production System

Decreases in kanban allocation. That is, reduction of the size of the iteration backlog. Therefore reducing the length of the iteration, until it eventually disappears entirely.

Comments (0)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Division of Labor in Lean Software Development Workflows

Imagine we have team of three people, each working as generalists in an agile-style process. They are all qualified and competent workers, and correctly execute the process that they intend to follow. They break their work up into customer-valued features that each take a few days to complete through integration.

One developer is a true generalist. It takes her a couple of days to produce a testable functional specification and high-level design. It takes her a couple of days to produce a detailed design and working code. And it takes her a couple of days to verify and validate everything, from code correctness to functional acceptance.

Another developer is basically competent at all of these things, but he is a more stereotypically geeky programmer who can crank out high-quality code for most product features in a day, on average. It takes him a bit longer than the others to do the customer-facing part, usually about 4 days for analysis and high-level design. He’s also a bit slower with the validation, because again, if it ain’t writing code, he’s just not that excited about it. And for all the time he spends on specs, they are still mediocre, which results in rework in spite of his good coding skills.

The third developer, by contrast, has a sharp eye for design and is very friendly and sympathetic to the business and the customers. He knows the business so well that most of his specs only take a day to write. He’s a competent coder, but a bit old-school in his style and it takes him a bit longer with the current technology. Plus, his heart isn’t quite in it the way it used to be. It takes him three days to develop good code that everybody will feel comfortable with. On the other hand, since his specs are so clean and thorough, and he has a good rapport with the business, the testing usually goes very smoothly in about two days, also (tied for) the best on the team.

The team, on average, produces features with a cycle time of 6.67 days per feature. Overall, each team member produces at a similar rate.

2d + 2d + 2d = 6d
4d + 1d + 3d = 8d
1d + 3d + 2d = 6d
-----------------
               20 days / 3 features = 6.67 days/feature

It is a one-piece flow (per developer), and everybody is always busy with his or her feature. Nobody ever has to wait to start a new feature. Other than the personal slack built into the task times, capacity utilization is high.

But imagine if this team of generalists were allowed to focus only on the skills that they were best at:

1d + 1d + 2d = 4d
1d + 1d + 2d = 4d
1d + 1d + 2d = 4d
-----------------
               12 days / 3 features = 4 days/feature

Same people, same features, 40% improvement in productivity…

…if only it were that simple, because there is also a cost here. If they organize themselves as a pipeline, then that pipeline becomes subject to the Theory of Constraints. If they apply Drum-Buffer-Rope, then the testing task sets the pace at 2 days. That means the total cycle time per feature is:

2d + 2d + 2d = 6d

…still an improvement over the generalists, but only by 10% (!). On the other hand, capacity utilization is now low, because two people now spend half of their time idle, waiting for the drum. Since they are the same people who were cross-trained enough to work as generalists in the first example, is there anything that they can do to speed up the testing process which has been otherwise unimproved? Surely the answer must be yes.

Suppose each of the first two developers spends an extra half of a day doing additional work to optimize the testing process, so that testing only takes 1.5 days to complete instead of two. Introducing a pipeline might also introduce a new communication cost, but imagine that the extra half-day spent by each of the first two developers is in collaboration on the two features they have in process, both communicating and optimizing testability.

The total labor expended is now 4.5 days per feature, but all of the idle time has been stripped out, so that the total cycle time per feature is also 4.5 days. That is a real 33% improvement in throughput. Same people, same features, same skills, 33% faster. It is only an example, but is it not a realistic example?

What about training?

An enthusiastic and observant Agilist might, by this point, object that we could improve the productivity of the team in the first example by improving their skills with training. That is indeed true. We could provide such training, and it may very well yield improvement.

We could also provide training to the team in the second example, which might also yield improvement. What sort of improvement might we expect in each example?

The generalist model suggests that we help each team member improve their weak skills to bring them up to par with the rest of the team. In any model, there is something to be said for cross-training because it facilitates communication and allows the business to adapt to change. But investing in training to overcome weaknesses is a classic management mistake.

In First, Break All the Rules, Buckingham and Coffman make a compelling and well-researched case that the best return on investment in training comes from enhancing a worker’s strengths, rather than overcoming his weaknesses. The geeky coder may have a lack of charm or graphic design sensibility that no amount of training can ever overcome, but picking up a new coding technique or web application framework might pay immediate dividends.

The more specialized team has another built-in advantage in training because they simply get more practice with their currently deployed skills. The analyst gets more training on analysis, which he already has an aptitude for *and* then gets to spend all of his time practicing.

Back to our first team, imagine that we invest in generalist training, so that our

2 + 2 + 2 = 6
1 + 3 + 2 = 6
4 + 1 + 3 = 8

becomes

2 + 2 + 1 = 5
1 + 2 + 2 = 5
3 + 1 + 3 = 7

…for an improved average cycle time of (5+5+7)/3 = 5.67 days per feature. A generous result in training, for a significant outcome.

What about our “invest in strengths” scenario for the first team?

2   + 2   + 1 = 5
0.5 + 3   + 2 = 5.5
4   + 0.5 + 3 = 7.5

…not as good! Even with a very generous 50% improvement for all, we only get 6 days per feature. But what about the specialized team? If our original:

1 + 1 + 2 = 4

becomes

0.5 + 0.5 + 1 = 2

…and then we add back some collaboration overhead:

1 + 1 + 1 = 3

…well, then we are just smoking the generalist team. These are contrived examples, but they should still illustrate some of the advantages that go to small lean teams over small craft teams. The most effective teams have complementary skills and personalities, not homogeneous ones. Otherwise, why organize into teams at all?

Still not that simple? Enter the kanban

A problem with both examples is that they deal with average features. But nobody ever actually works on an average feature. They work on real features that can be averaged over time. Those features have variation, and sometimes a lot of it.

For this reason (and others), we don’t organize our workflow by role. We organize it by task or process, and let team members apply themselves to the workflow in the most efficient manner. They may even hand off the work at different transitions depending on the feature or the state of the pipeline. Such a soft division of labor preserves the efficiency advantage of each worker, while also allowing for variation and changes in circumstance. What matters most is that the work is done in the right order by the best resource available at the time, not who does what.

Pooling work-in-process according to the kind of asynchronous kanban system we’ve been discussing smooths out the flow of variable-duration work items, so that some of the variation in cycle times between processes is traded off for variation in inventory. Such a pooling strategy works better with more people than our examples, and also more people than the current common practice for agile teams. For a pipelined kanban system, we think that about 20 people is the sweet spot.

Comments (10)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Close
E-mail It
Socialized through Gregarious 42