project management

Multi-site teams: travel and the half life of trust

If you’re working on a distributed creative team, especially ones spread across timezones, today’s post from Steve McConnell is a great reminder that you’re not alone in your struggles:

Travel restrictions and offshore development

The article is right: there’s currently no substitute for travel at those kinds of intervals.  “The half-life of trust is 6 weeks” rings true.

Even in normal times, this is a heavy cost to bear, both for the company and for the people on the road. I’ve been at companies that have handled this “ok”, but never seen it be 100% positive and pleasant — this is a very hard problem.  There’s a reason why Microsoft largely kept everyone on the same campus for almost 20 years up to the late 90s — and a reason why you should think about keeping things simple that way as long as you can, too.

If staying single-site as long as possible is the first line of defense, the second line of defense is trying our best to minimize and partition multi-site development in careful ways — e.g. into distinct products, projects, or features — to minimize trust issues and cost of communication.  But this can only go so far in avoiding the issues of trust entirely. At some point (likely sooner rather than later), the worlds must meet, rules must be followed, decisions must align, and work on the product will overlap.

So, then, how do we design our processes and communication to be multi-site friendly? What processes and culture do we insist stay common or allow to be flexible?  How do we maintain the trust to coordinate the things that we must?  How can we hire more forgiving personalities for whom trust and camaraderie come easier?

There are certainly lessons to be learned from highly distributed open source projects (in particular, the tools they use and the ways they use them), but also cautionary tales of the borderline chaos that can ensue when the ties that bind are so loose and light.  And I’m waiting for a good tell-all to be written on Google and some other other more recent companies who have embraced highly distributed organizations.

Going back to Steve’s post and how there’s no substitute today for spending time in person: we can only  hope someone eventually finds a way to make multipoint video conferencing and techniques of remote socialization and team-building much more effective. It’d be great to not consume the time and energy of flying all over the planet — but that day doesn’t yet appear to be here.

Perhaps we could start with some company-sponsored network gaming to have some fun and get to know each other better? … of course, we then must decide which of the Bangalore, San Francisco, or Cambridge teams we’ll ask to get up at 7am to play.  Hmmm.

Comments (6)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Completion queue as incremental throttle

In the last two posts, we’ve discussed some useful properties of internal workflow queues:

  • queue states between processes can provide an early warning of process breakdowns
  • local work-in-process limits serve to slow down a malfunctioning workflow and free up resources to fix it
  • queues can sometimes be combined to reduce the total work-in-process while still preserving their buffering function

I gave an example of workflow throttling, and suggested there was another configuration of those internal queues that could respond more smoothly and gracefully than the simple, independent queues given in the example.

In order to pull a work item, there has to be a place to pull it from, and there should be some way to distinguish work that is eligible to be pulled from work that is still in process. At the same time, there has to be a place to put completed work when you are done with it. A completion queue serves both these functions.

In this case, we can have up to 3 items in the “specify” state AND we can have up to 3 items waiting for the next state in the workflow. The team can pull new work into “specify” whenever there are fewer than 3 work items in process. If there are already 3 work items in process then the team will have to wait until something is moved into the completion queue. If there is some kind of blockage downstream, first the completion queue will fill up, THEN the specify queue will fill up, THEN the specify process will stall. And when it stalls, it stalls all at once. The flow is either on or off, there’s no middle speed, and it keeps going until it stalls.

In another example, we still have a busy state and a complete state, but the token limit is shared between them. In this case, we can have 4 items in process OR 4 waiting. Or we can have (3 busy + 1 waiting) OR (1 busy + 3 waiting).

In the ideal case of 3 busy and 1 waiting, this queue works just like the first example does. However, if work starts to accumulate in the “complete” state, then the “specify” state will incrementally throttle down. The effective WIP limit for “specify” goes from 4->3->2->1->0 as more items are completed ahead of the rate of downstream intake. So, the process slows before it stops, and it slows much sooner than it would have under the independent queues.

What’s more, even though it operates in the same way in the normal case, it does it with two fewer kanban in the system. Fewer kanban, with gradual throttling and smoother flow, should result in lower lead times.

With this in mind, let’s reconsider our scenario from the previous topic:

1. Something is going wrong in the design process, but nobody knows it yet.
2. The specify-complete queue starts to back up, thereby throttling down the WIP limit for specify. A resource is freed as a result, who should now inquire into the cause of the backup, which may only be random variation. The code process continues to complete work and pull from the existing backlog.
3. Code state begins to starve and specify state throttles down another level. Two more people are released as a result. There’s more than enough free resources now to either fix the problem or shut down the process.
4. The stall completes by flushing out the specify and code states.

It still takes a while for the system to stall completely. The difference is that it begins stalling immediately, and when it does stall, it stalls with less WIP. For equivalent throughput, this pipeline should operate with fewer kanban and less variation in WIP, and therefore should have smoother flow and shorter lead times. It should respond faster to problems and free up resources earlier to correct those problems.

These shared completion queues might be the most common type of workflow queue. There are a couple of other types that we use, and we’ll take a look at those in a future post.

Comments (1)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Boehm’s Spiral Revisited

Twenty years ago this month, in response to the problems associated with waterfall-style approaches to software projects,

Barry Boehm proposed his Spiral Model of Software Development.

Which bore some resemblance to Deming’s “Plan, Do, Check, Act” cycle.

Boehm’s insights have had a huge positive impact on how we think about software development. But the spiral itself lost some of the beauty of Deming’s model: the simplicity, self-similarity at different scales, and the balance between activities in the quadrants. Which, perhaps, has caused Boehm’s model to be underused as a tool for introducing people to how creative engineering works. This is unfortunate, because the waterfall model, being more obvious, continues to be where most people start. Only then, after they have personally experienced the pain of struggling projects, do they search for a more appropriate model.

The Simple Sprial

Here is a simple love-child between Boehm’s and Deming’s views which has been very helpful to me in keeping a visual model in mind when thinking about how effective software development (or any creative engineering) really works.

Customer Plan
Test Design

How does this work?

  • The more iterative your development process, the more times you spiral around
  • You spiral inward from the high-level descriptions down to the lower level implementation details (note: this directionality is inverted from Boehm – this model doesn’t try to convey the amount of cost or work in each loop around the spiral)
  • As you spiral down, the activities change. “Design” at the high level might be on paper. But as you spiral down, design is about turning those paper documents into executable code. Same for the other quadrants.

Quadrants

  • Customer. What does the customer think? In one of the better trends of the last 20 years since Boehm’s paper, agile methodologies have recognized the customer as an essential direct participant of the development process. We can try to guess what the customer ultimately will find valuable. But if we don’t regularly check back with them, we’ll get enough wrong to sink our product and company over time.
  • Plan. What do we plan to do? This includes requirements analysis, priorities, risks, and schedules. At the very high level, it may be corporate goals. At the very low level, it might be writing an automated functional test before writing the code to make that test pass.
  • Design. How will we do it? At the high level, design is done via documents, diagrams, and discussion. At the lowest level, design is expressed as the executable code that constitutes the product.
  • Test. Have we done it right? At the high level, we discuss and review ideas and documents. At the lowest level, we execute tests against the functioning product.

The four quadrants align with how we tend to specialize our people and organizations as we grow. In a Microsoft organizational model, it aligns with customer, program management, development, and test.

The “Simplistic Spiral”

The simple spiral is useful, because it is flexible enough to encompass many approaches to development. Take the waterfall model from the top of this post, and wrap it into one loop through the spiral, and you get “the simplistic spiral.”

Customer Plan
Test Design

Wouldn’t it be nice if projects could reliably just work this way?

But we know if we apply this model to a large product, we’re nearly certain to have a disaster on our hands where many assumptions made in planning are discovered to be poor in design or test.

But apply it to a queue of appropriately-sized (small) and well-understood functional requests, and it may be an appropriate model for each kanban in a lean production workflow (or perhaps each kanban should be two or more loops round the spiral). In any case, the model is helpful in visualizing all these cases.

The “Product Spiral”

It also helps us create useful visualization of product lifecycle models at various scales.

Company
Product
Project
Feature
Change

At the company level, we are constantly cycling: seeing what the customer/market reaction is to our products, planning new products and enhancements, designing and testing them.

At the product level, it’s the same, but from the perspective of the evolution over a lifecycle. But even when a product enters later lifecycle phases like maintainence, the model is still the same: the customer finds bugs, we prioritize, fix, and test them — and it goes back to the customer for the cycle to start again.

At the feature level, we recognize that this has a lifecycle of its own. Before committing a feature to a product, all aspects (including feedback from the customer) should be covered, likely with several iterations. Perhaps one of the secrets of success to many open source ecosystems is that they encourage/allow individual features to evolve independently and iteratively, before committing to integrate them.

At the change level, we have gotten the granularity small enough that each change may appear to be its own mini-waterfall of plan, design, and test. But even that is a simplification — chances are, the person making the change looped through many interrelated planning, design, and test alternatives in their head before committing the change.

Applications

In future posts, we’ll apply this model to take a look at other aspects of the engineering process.

So — is this a useful model for thinking about engineering, particularly software? Or is this model dangerously simplistic for helping to think about how your organization works?

Comments (2)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Spiraling Into Control

[This was originally posted March 7, 2005 on my personal blog. Stumbled across it, and felt it might deserve a post here] 

The other week I attended a panel discussion. This was part of a pilot of a larger course for program managers. Before the panel began, the instructor was relating a conversation where a developer on a project was speaking of it as “sprialing into control.” The instructor left open the possibility that he thought the developer was crazy. The room of program managers laughed. Thoughtfully.

What a great phrase.

We want to clamp down to get our projects under control. Enforce rules to get repeatable. We want to keep our teams on the shortest path from A to B.

But the interesting projects haven’t been to B before.

And they’re dealing with shifting human dynamics on the team, discoveries that don’t reveal themselves until uncomfortably late, and a world around them that isn’t standing still.

We can’t leap to control and stay there. We can only sprial close. And, with constant effort and feedback, we hope to stay close.

Short iterations. Small increments. Minimized work-in-progress. Communication and retrospection. Discretion to adjust the process to keep breaking bottlenecks.

Control through feedback, not prescription.

Comments (2)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Better estimates with Wideband Delphi

Dates and deadlines are an essential human element of project management. People work better if they have challenging but realistic schedules to work against.

The trick is “challenging but realistic.” In software, we know there is wide variation in our estimates, because we are almost always creating something unique (if it’s been done before, we just copy the bits). And we have a systemic underestimation bias, because there are lots of ways to cut scope or cut corners in software, and the intangible nature of it all makes anything seem possible.

Unfortunately, when schedules are no longer realistic, it will quickly destroy a project: causing cynicism, demotivation, short-cuts, bad decisions, unwillingness to respond to new information, loss of honesty and trust, and other problems which will fester. We could avoid these pitfalls if only we could estimate better. There are good books on this, including McConnell’s Software Estimation: Demystifying the Black Art and Wiegers’ Practical Project Initiation.

Out of all the techniques covered in those books, one widely-used technique is particularly effective for those critical early estimates of large, not-yet-well-understood projects. Estimates upon which we base our go/no-go decisions and early expectation-setting for upper management and customers.

It’s called Wideband Delphi, and here is a simple spreadsheet template and guide for the Wideband Delphi technique. Take a look, and let us know if this is useful to you and your groups.

Comments (5)

Print This Post Print This Post

Email This Post Email This Post

Permalink

Close
E-mail It
Socialized through Gregarious 42