Between kanban and pair programming lies the feature brigade

It is easy to understand why the Agile community rebelled against some of the traditional roles and responsibilities in software development organizations. Since the phase/gate model of project management aligns itself according to such roles, and Agile thought rightly identifies phase/gate as the primary disease afflicting the profession, it is predictable that anything associated with that model would also be considered suspect. Skepticism about division of labor also promotes an affinity with some folks with a “revolutionary” disposition, and I don’t think it’s controversial to suggest that Agile still holds considerable appeal for the rebellious.

The central management problem of product development remains: how do you coordinate communication between a large number of highly trained individuals working on a common problem that no one individual can fully comprehend? The better part of any development process or philosophy addresses just this issue.

Scrum and XP, like most processes, define a set of roles for project participants. Some of those roles preserve or reinforce conventional divisions of labor (product owner, pigs/chickens), other roles seek to break down such divisions (technology specialists, testers). To some degree, the product owner role is a black box placeholder for further roles that are not visible to the core development team. One classic division (requirements analysis) is strongly reinforced, both through the product owner role, and also by the transactional barrier of iteration planning and product backlog.

Other divisions are broken down with new practices like test-driven development and pair programming. TDD automates the discovery of errors, so therefore has a pretty clear Lean interpretation as an example of Jidoka1.

Pair programming is a bit more problematic with respect to Lean, which is greatly concerned with the conservation of labor. Slack capacity in a lean system takes the form of facilities and equipment, and labor capacity is highly optimized.

“To improve operations, the Toyota production system focuses on manpower cost reductions. By comparison, relatively little emphasis is placed on raising the operating rates, even though they are, along with man, the primary agents of production. The reason for this is straightforward: For a given period of time, the loss will be about five times greater for idle workers than for idle machines. Moreover, Toyota realized that no matter how low equipment operating rates might be, for the purposes of cost reduction, it was more effective to concentrate on human labor costs. Failure to grasp this point clearly and keep in in mind may well lead to a misunderstanding of the exact role that manpower cost reduction plays in the Toyota production system.” — Shigeo Shingo, A Study of the Toyota Production System

To be sure, the relative proportion of value added by machines vs people is going to be pretty different for knowledge work vs. manufacturing. At the least, you can interpret Shingo’s advice as meaning it’s stupid to cut corners on office space, computer monitors, and software tools–a mistake I see much more often than I would like. But it also means that the kind of labor redundancy of the practice of pair programming is pretty contrary to Lean.

I should point out that I’m only trying to distinguish “lean” vs “not lean” here, rather than good vs bad. Arguing what lean is good for is another discussion. If I haven’t been clear about this before, I have been a great believer in “situational pairing” since about 1995. There are times when the uncertainty or complexity of a problem is sufficiently high that the cost of delay or failure clearly exceeds the cost of redundancy. I also believe that those times are not “all of the time” or even “most of the time,” but there are all sorts of occasions when it is useful.

Kanban systems are a nearly orthogonal solution to the same problem of coordinating work and workers. Both kanban and pairing address the problem of workflow efficiency and loss due to handoffs. Kanban systems seek to exploit labor specialization advantage and optimize transaction efficiency. Pair development seeks to improve individual productivity and prevent errors and rework. Each approach depends on a tradeoff. Does productivity advantage exceed transaction overhead? Is a pair more productive than the sum of the individuals?

Kanban and pairing are not exclusive. The states in a workflow are best understood as processes and not people. Work moves from one process to the next and then people apply themselves to the process. Those people can be pairs of people as well as individuals. They could be a pair of similar skills, like conventional pair programming, or they could be a pair of mixed skills: analyst&programmer, programmer&tester, analyst&tester, etc. An analyst might decide to flow with the kanban to the next state if she feels that there is some uncertainty to the task that requires closer collaboration in design. I would expect a mature team to think in exactly this way. A great thing about pull systems for knowledge work is that they give people so much power over how they want to organize for particular tasks. Situational pairing is a pretty good way to offset some of the risk of handoffs. It’s also something that lends itself to kaizen optimization.

A step towards zero buffer inventory

There may be a third method that combines some of the strengths of kanban and pairing and reduces their respective weaknesses. A bucket brigade is a stockless self-leveling workflow with a dynamic division of labor.

A simple type of bucket brigade retrieves water from one source, like a river, and delivers it somewhere else, where it is needed. Each link in the brigade carries water from the direction of the source toward the destination until they meet another link traveling in the opposite direction. Then they exchange the bucket full of water for an empty bucket and turn around and travel back toward the source.

If they are in the middle of the chain, they will then meet another link carrying water and once again exchange the water for an empty bucket, and turn around to meet the downstream link. When the receiver has gathered enough water, he can retire each carrier as they appear and take their buckets out of circulation. The last carrier will then travel the entire distance of the brigade and then retire.

Since the links in the brigade are likely people of different strength, speed, and endurance, covering different terrain, there will be differences in the amount of ground that each link covers in his circuit. Not only will there be different local capacities, but those capacities may vary considerably over time as each link takes short breaks, stumbles, or slowly wears down due to fatigue.

What’s most interesting about the bucket brigade is that it is almost entirely self-regulating. No inventory buffers are needed to absorb variation in station cycle times. No conscious adjustments are required from the carriers in order to adapt. Handoffs are directly triggered by downstream availability. Capacity can easily be added or subtracted by adding or removing links from the chain and the system will spontaneously redistribute the work load in response.

The value added by the process is the transportation of the water. The water has value and the labor expended to move it downstream adds value. The labor required to move the buckets back upstream, however, does not add value. It is waste, but perhaps necessary waste. Is there anything we could do to reduce this waste?

You could think of this simple bucket brigade as utilizing only 50% of available capacity. We could utilize the other 50% by sending something back upstream. Perhaps we could send greywater back up to be dumped in the river (assume that the upstream users have a stake in the local ecology, and that the greywater does not contaminate the river).

Generalizing this makes the bucket the kanban–an order for more work. There only need to be as many buckets as there are carriers to handle them, and they only stay in circulation as long as there is demand. If there were multiple sources, the bucket-kanban could be marked with instructions about what to put in the bucket by the final picker.

Just as kanban systems can be generalized from supply chains or assembly lines, a bucket brigade can be generalized for arbitrary workflow management. “Bucket brigade” might not sound sufficiently dignified for the matter at hand, so perhaps for our purposes we can call it a feature brigade.

The operation of a Feature Brigade

A kanban system is a more flexible division of labor than any phase/gate system, while still being stable and well regulated. Role definitions can be adjusted by kaizen updates to workflow or procedures. Work orders can be pulled by any available cross-trained team member, and workers can flow downstream to collaborate with the next station. Nonetheless, work transitions still occur only at well defined points, and inventory buffers are necessary to synchronize tasks of variable duration so that downstream workers do not have to wait for new work orders. Inventory can be costly, but idle workers are more so.

A feature brigade has most of the advantages of a discrete workflow, but it can be (and must be) even more flexible. Any two adjacent workers must have overlapping skills, because where and when they meet is not predetermined.

A simple case would be a 3-person one-way feature brigade, with an analyst-designer, a designer-coder, and a coder-tester. Any time the coder-tester considers himself to be finished with his current feature, he checks it in as complete and signals the designer-coder. Since the coder-tester is also a coder, he interrupts the designer-coder at any time during coding and takes responsibility for her current feature. If the designer-coder thinks she is at a good transition point, then she may just hand over what she has, with a specification and a walkthrough. It is more likely that they will collaborate for a while on the same feature until she believes that the coder-tester understands it well enough to continue on his own. In other words, they will pair program. Once she hands off, then in turn, the designer-coder will signal the analyst-designer that she is ready to start working on the next feature, resulting in a pair design session.

That’s a pretty interesting scenario, but we’ve introduced a coordination problem between the analyst-designer and the coder-tester. How does the coder-tester know the intent of the analyst-designer? Why would the analyst-designer trust the judgment of the coder-tester to validate the results? In order to make this work, a lot more detail will have to go into the specification in order to convey the intent of the analyst. Most of that additional work is non-value-added process overhead.

It would be better to find a way to include the analyst more directly in validation. Fortunately, this isn’t the first time this question has come up in software engineering methodology. The V-Model of software development was invented to address the same problem in the Waterfall Model. While we still aim to implement a pull system, perhaps we can learn from what the V-Model did to address this flaw.

The V Model attempts to improve quality and reduce rework by explicitly pairing each value-adding step of the development workflow with a verification or validation step. It requires some kind of coordination activity between the “downstream” operator and the corresponding “upstream” operator before the work can be promoted to the next step in the workflow.

One could argue (and I do) that the Extreme Programming workflow is a modern interpretation of the V Model, where one-piece flow has replaced the old phase/gate packaging of work requests. A similar interpretation applies to the SEI Personal Software Process, which itself can be modified to be more feature-oriented.

At this point, we can see if the symmetry of the V Model and the symmetry of our bi-directional bucket brigade can be combined to address the coordination problem of our first feature brigade.

A simple two-person feature brigade has each link alternating between development and verification activities. Each person verifies the work coming upstream that they had previously passed downstream. At each handoff, there is an opportunity to collaborate with an adjacent link on both the downstream and the upstream exchanges.

To visualise, this system would operate in alternating phases. In the odd phase, the analyst-designer is (oddly enough) analyzing and designing, and the designer-coder is optimizing and verifying. In the even phase, the designer-coder is designing and coding, while the the analyst-designer is verifying and validating. Because the analyst-designer and the designer-coder are both designers, it doesn’t much matter when they meet for the handoff. The more skilled the analyst-designer is, the more work he will complete before the hand off.

In this way, the system is self-leveling. The more the skill of the designer-coder improves, the earlier he will be able take possession of the kanban. Furthermore, the handoff does not have to be a simple exchange. In fact, this is where the synthesis of kanban and pair programming occurs. The handoff can be an extended collaboration, where the analyst-designer and the designer-coder work together on design, until they agree that they both have a common understanding of the problem and the designer-coder can successfully complete the design on his own.

This makes the feature cycle a little bit more elaborate, but not by much. There is now an alternation between working together and working independently.

  1. In the first phase, the analyst-designer is analyzing and designing the ith feature and the designer-coder is optimizing and verifying the i-1th feature.
  2. In the second phase, the analyst-designer and the designer-coder are both verifying the i-1th feature, as a pair.
  3. In the third phase, the analyst-designer and the designer-coder are both designing the ith feature, as a pair.
  4. In the fourth phase, the analyst-designer is verifying and validating the i-1th feature, and the designer-coder is designing and coding the ith feature.

Then the whole cycle starts again with the i+1th feature.

The cost of introducing this bidirectional model is a small swap buffer during the collaborative phase. Since each worker is holding one kanban, and they can only work one one at time while they are collaborating, then the other kanban must be idle for some part of the handoff period:

You can see that the “swap buffer” idle period is fairly small relative to the rest of the value stream. That gap is the productivity advantage we have to realize from specialization in order to benefit from the feature brigade strategy.

Since we’ve introduced this notion of an extended collaborative handoff, we can interpret pure pair programming as a special case. We can also interpret a fixed-transition kanban system as a special case of an instant handoff. Finding such a hybrid enables us to apply the benefits of either extreme, or offset the disadvantages of either extreme. We can realize some of the benefits of pairing, like cross-training and continuous peer review, with some of the benefits of kanban, like pull and specialization advantage.

We started with a simple division by job function, but there are other collaborative relationships we can manage by this method. A complex feature may involve collaboration across technical specialties, like user interfaces and database. We can also use the self-leveling mechanism to introduce new people in a project in a way that minimizes their disruption and accelerates their learning.

Like any bucket brigade, the bidirectional feature brigade scales out to three or more links. You can continue to pair at each meeting as well. Three or more links forms a 6-phase cycle, where the pairs alternate in an even+odd, odd+even sequence. It makes a lovely graph, but I’ll leave you to draw that as an exercise!

You might begin to worry about synchronizing a 6-phase cycle, but don’t! The feature brigade is entirely self-synchronizing. People meet when they meet, and overlapping skills and pairing absorb all of the variation.

1. Of course, I still think design by contract is a better example.