A Tool for Better Reviews with Capture-Recapture

What is Capture-Recapture Analysis?

A statistical technique for estimating an overall population from a series of smaller population samples. Its earliest and most common applications have been in population biology. It may also be called Mark and Recapture Sampling or Band Recovery. See Wikipedia for more background.

In theory, how can this method be applied to improve the effectiveness of reviews or inspections of documents, code, etc.?

Beyond a count of defects found, the statistical magic of Capture-Recapture will also provide an estimate of the number of defects that remain unfound. For more, see this great explanation from Corey on how this works with two reviewers.

As rough as the resulting estimate is, it provides powerful feedback on how effectively the review was conducted, both overall and per reviewer. It can help the group decide whether a re-review is called for … along with making it clear who only skimmed the file right before heading into the review meeting (yes, many of us have been that slacker).

Capture-Recapture inspection has a long history — it has been in use on critical software projects since at least the 1970s. It has been widely studied, and is a core quality technique of Watts Humphrey’s TSP methodology. It has been shown to be a powerful but underused group technique not just for finding bugs now, but also for learning and creating a culture of quality for the long term.

But one of the limiting factors for adoption has been a lack of simple, free tools for handling the inspection data. So we’ve created a very simple spreadsheet to use as a starting point. Let’s look at an example with this sheet.

Example

In this contrived sample review, four people participated and found 7 bugs. Based on the divergence of bugs found, it’s estimated that 3 bugs were not found, making the total estimated 10. So it’s estimated the group found 70% of the bugs that were actually present in the code. These results are with the yield of individual reviewers between 20 and 30% — which is quite low compared to the 60-75% individual rate than can be achieved by practiced reviewers that pace themselves in the range of 200 lines/hour for code.

In order for the feedback to be valid, the group must be careful to follow a process which asks the individual reviewers for their best effort, and only then combines the results as a group:

  1. Review the material as individuals, with enough time set aside to review at a very detailed level (that pacing issue just mentioned). And reviewers must try to find all types and as many defects as they individually can.
  2. When the actual bug finding is complete, the group gets together to collate the results. Each individual is asked to call out each bug they found. Others call out if they also found the same. This is all noted on the sheet by marking a 1 in the column (P1-P5) that corresponded to the reviewer(s) that found it. All the other statistics are automatically generated from that simple data.

Once all data is collected, the group looks at the results and takes a few minutes to reflect on how effective the review was, and what steps they can take to make the next review more effective.

The Spreadsheet Template

So there you go! You can make your own copy for a new review anytime from the live, read-only version of this spreadsheet template at Google Docs or grab the exported Capture-Recapture Inspection Template for Excel .

The sheet uses no special scripts, and only functions which port automatically between Google, Excel, and others.

There are a lot of potentially useful tips for applying this technique in different environments, and we’ll have follow-on blog posts to talk about some of them in time. Subscribe if you’re interested.

How does this sheet work?

You can find some great original background in Appendix C of Watts Humphrey’s Introduction to the Team Software Process(sm).

When does it make sense to use this level of review?

Not every line of code in every software project warrants this kind of attention, although some methodologies like TSP call for it. Rather, the more your software is on the ‘critical’ side of the spectrum, the more this makes sense. If you have spare reviewing capacity (e.g. some popular open source projects), it’s quite interesting. If you’re in a domain where automated testing is difficult (e.g. device drivers), this is just a huge potential win.

But for many teams, this kind of technique is effective when done in a sampled fashion: when you create a new component, or have a new developer join the team, or adopt a chunk of code that keeps causing trouble — then do a series of focused inspections on that code to get the team on the same page of what quality looks like, get a detailed sense of how that code rates against the standard, and get those bugs out in the open that might otherwise trickle out in testing.

From a lean perspective, if customer-visible code-level bugs are your team’s primary bottleneck, then this kind of inspection technique is a huge lever that you can pull to get that quality under control, and your bottleneck shifted to the next challenge (such as responsiveness to customer demands and feedback, which can suffer when there’s too much focus on process).

Spread the word

Please let us know here in the comments if you find any bugs in the sheet itself (perhaps we should gather a group of reviewers!?), along with any other feedback — we’d love to hear if you like it, have trouble with it, or have ideas to improve it. Thank you!