The goal of a kanban workflow system is to maximize the throughput of business-valued work orders into deployment. It achieves this by regulating the productivity of its component subprocesses.
I’ve spent the last few weeks bootstrapping such a kanban system for an enterprise software project. It’s a pretty big project, with over 50 people directly participating. Starting up a new project means making guesses about workflow states, resource allocation, productivity, work item size and priority criteria, and so on.
This project is too large for a single pipeline, so we have a nested structure that processes incoming scope in two stages. The first breakdown (green tickets) is by business-valued functional requirements that can be covered by a small functional specification and corresponding test specification. The second stage (yellow tickets) breaks down these “requirement” work packages into individual “features” that can be completed through integration by an individual developer (or pair) within a few days. The outer workflow is fed by a Rolling Wave project plan, but the flow itself is expected to be continuous. Scope decomposition is generally as “just-in-time” as is tolerable to the stakeholders.
Only time and real live performance data can tell you what you need to know to configure such a process correctly. It takes a while to move enough work through the system in order to obtain sufficient data to set the right process parameter values. Until then, you have to keep a sharp eye on things and engage in a lot of speculation about coming events. A particular challenge is with measuring latency. Latency can be a much bigger value than throughput. Worse, latency at the beginning of a big project is likely to be much worse than its stable value. New people working on a new project using a new process make for abundant sources of variation with both special and common causes. You have to see through all of this early noise in order to estimate the implied stable latency. Then you can get down to the hard work to make the worst of that variation go away, and buffer for the rest.
In comparison, bandwidth is easy to manipulate. For a stable process, adjusting bandwidth can have a relatively immediate impact on performance. But at the beginning, there’s pretty much nothing you can do but help push that first order through the system as quickly as possible. You have to prime the pump, and that is a different problem than regulating flow. The trouble with estimating bandwidth is that you won’t know if you got it right until you can measure latency. Overshooting bandwidth might result in a traffic jam in a downstream process that will stretch out your lead time. Undershooting bandwidth will result in “air bubbles” flowing through the process that confound your ability to configure downstream resources that are also ramping up.
The pressure is to overshoot. Everybody who’s available to work thinks that they ought to dive in and start hammering away. It’s hard to tell people to wait for the pull when there’s nothing to pull but slack. You have to imagine what the rate of pull is going to be, adjust the input valve accordingly, and try to get people to contribute anything they can towards reducing latency. If there is ever a good time to employ pair programming, this is it. But then, that’s just one more thing you have to try to convince people to do. When they’ve been champing at the bit, everybody wants their own piece of the pie.
Until you have meaningful throughput measurements, you have to make hands-on adjustments to bandwidth based on the live behavior of the workflow. If you see the traffic jam forming, close the valve. If you see the air bubble forming, open it up. It’s only later that you can let a well-sized buffer absorb the random variation without intervention.
If it were all up to me, I would always start one of these projects with a small pilot team. I’d let the workflow latency stabilize before ratcheting up bandwidth. Otherwise, there’s just too much variation to control without exceptional effort. Alas, it is difficult to explain why you should idle available resources in order to stabilize your process while the cold, hard wind of calendar time is blowing in your face.
But that is a battle for another day.