When Agile meets data: a call for longitudinal studies
Laurent Bossavit
We are nearing the 10-year anniversary of the meeting in Snowbird, Utah at which the term "Agile" was coined, and in that time I have seen Agile grow from a small and disparate group of challengers of the software engineering consensus, to a large and vibrant community with a serious claim to representing a new discipline in software development.
Perhaps you now think I need a cold shower. Inevitably, as the community has grown, it has attracted challenges and skepticism. Most recently, Larry Constantine - one of the giants to whom our field is indebted for pioneering work in not one but several areas, such as the concepts of coupling and cohesion or User-Centered Design - threw the following gauntlet (Real Data) (1):
Capers Jones has been sharing with me some hard data summaries on a variety of development methods and practices. An interesting thing is that agile methods fare better in most measures but are NOT as good as the Rational Unified Process and all three are trumped by CMM level 5. What would it mean to the agile community IF these findings really were valid and true? Would that mean we should switch horses? Or would it mean we should revise agile to incorporate the best parts of other practice traditions? Or does the agile community have the TRUE answers, regardless of the facts?
Epistemic attitude
Now this may date me, but what came to mind as I pondered Constantine's challenge was a scene from a 20-year old movie. Rob Reiner's "When Harry Met Sally" is one of the defining instances of the genre "romantic comedy". Harry and Sally, in the early stages of courtship, are having lunch at a restaurant, and somehow the conversation turns to whether men can tell the difference between real and simulated pleasure. Harry having expressed skepticism that his previous partners could have faked it convincingly enough to fool him, Sally launches into a loud (and ego-deflating) demonstration. In the shocked silence that ensues, a prim old lady catches a waiter's attention and in her reedy voice says firmly, "I'll have what she's having."
I am retelling this famous scene to provide a short catchphrase for what I think has been a norm of behaviour in sincere Agile practitioners. The Agile movement was largely fueled by people visiting or hearing about other agile projects, seeing loud signs of satisfaction and saying, "We'll have what they're having."
By contrast, the other two items in Constantines's comparison were, not to put too fine a point on it, designed by committees. We can examine the resulting differences in how these different traditions approach knowledge, their epistemic attitudes. My intent is to focus on the Agile tradition mainly, rather than criticize others, and attempt to answer Constantines's questions as best as the evidence allows.
When Agile met community
The Agile body of knowledge is inseparable from the Agile community. Extreme Programming, which in the early days contributed the most material to that body of knowledge, was largely refined and developed on Ward Cunningham's original Wiki, by a diverse group of contributors. These contributors were not all experts; many of them were improving Extreme Programming even as they were learning it. The entire process fit to a tee the "Communities of Practice" model of learning as theorized by Lave and Wenger (CoP).
Later on, Agile practice, concepts and values were further popularized by conferences whose unusual formats played a key role in continuing the Wiki dynamic. The XP Day conference, originating in London but soon spreading worldwide, was a low-cost event where you could expect mostly hands-on, interactive sessions and very few slides. The focus was on "learning by doing" rather than on knowledge transfer. Experience reports played another key role, allowing practitioners with intermediate levels of experience to formalize the insights gained on actual projects by sharing them with more novice attendees. The Open Space format, where the conference agenda was drawn up on the spot rather than fixed in advance, also gained popularity in the agile community.
In the past five years or so, I suspect that the main factor in Agile's popularity soaring to new heights, to the point where no one any longer seriously questions that it has "crossed the chasm" into mainstream acceptance, was the runaway success of the Scrum certification model. That model has been justly criticized and I will not take up that debate here, but I want to observe that this model was nevertheless in line with the community dynamic I have described.
The Scrum model consisted of an elite group of Scrum Trainers being anointed by Ken Schwaber, Scrum's originator - initially none other than Ken, though that responsibility was later turned over to the Scrum Alliance. These trainers were the sole authorized purveyors of a standardized, experiential Scrum course, typically taught to a 20-person group who received the coveted "Certified Scrum Master" title. Over time these groups became the main constituents of a new cycle of conferences, the Scrum Gatherings. There now seems to be a Scrum community with a distinct identity from, though still a lot of overlap with, the Agile community, whose nominal oversight body is the Agile Alliance.
Fake and real success
With this admittedly too brief recapitulation of Agile history as background, we can turn back to Constantine's questions. We have answered at least one: would an "agilist" be interested in "the best parts of other practice traditions"? Yes. In spite of differences still being worked out among Scrum, XP and Agile, all three share a bottom-up commitment to taking the best of everything, in service of providing greater value to whoever we are developing software for. If it's shown to work, we'll have what they're having.
Would an "agilist" switch horses? No, if by "switch horses" Constantine means renounce Agile as a tradition, bodily, and embrace one of CMM or RUP, the other two items in the comparison. Larry Constantine has indicated that the data set he's seen summarizes various measures in a spreadsheet, "methods" in rows - Agile vs CMM vs RUP - and various measures of project success in columns; the only specific one mentioned was "total cost of ownership". This is too coarse grained and subject to attribution bias - I'll explain that below.
If I may go back to Rob Reiner for a minute, we shouldn't forget one key aspect of Sally's scene - namely, that she is faking it. Therein lies a whole class of complications.
Software development projects turn out to offer somewhat similar opportunities for faking: irrespective of what has happened "on the inside" - delays, disorganization, low morale, overtime, end user disappointment, underused features - there can be strong incentive on the part of the project's management to declare the project a success anyway, and sweep various inconveniences under the rug.
Any data collection effort should control for sources of bias, and this is a big one. The PERT chart, a popular technique for project management, was developed by the US Navy in the late 1950s and used in the development of the Polaris missile system, a mammoth undertaking. Everyone at the time reckoned Polaris a great success, and it was widely assumed that PERT was a major factor in that success. Upon later review (Sapolsky) it turned out that none of the engineering departments in fact thought their success had anything to do with PERT; the only group who thought they'd benefited from PERT were essentially the project's PR people. But they were the ones most in the position of imposing their favored explanation for the project's success. (2)
On top of attribution bias, there is a possible selection bias if you go around asking people for data about their projects that have recently finished. I don't know that this is how Capers Jones has been collecting data, but it is one possibility that makes me skeptical until I have more information about the study. Because CMM and RUP outwardly present as "serious" outgrowths of the mainstream of Software Engineering, they are likely for the same reasons as PERT to be identified, after the fact, as the success factors in successful projects.
The data we'd like to see
For a data collection effort to be convincing, it has to strongly guard against both of these sources of bias (3). To do that, the best possible approach would seem to be longitudinal studies: you would collect information about projects from inception, and only allow projects into the study that were declared in a very early stage, i.e. at the point where some executive is thinking about committing to the project and well before anyone else starts being involved. We want to collect declarations of the form "we are thinking about starting project X with approach Y to achieve business outcome O".
Further, such a study would have to collect fine-grained data over the course of a project, regarding the actual practices used and their outcomes. "Agile" is too broad a category and covers literally dozens of describable practices, any of which might play a key role in the project's outcomes. Moreover, the project team may take up and abandon practices at various stages in the project - and this, contra a perhaps naive interpretation of "best practice", is not necessarily a mistake. (My colleague Alistair Cockburn would in fact argue that it is to be expected in every project.) The type of declaration needed is "we plan to adopt practice P at time T in the expectation of benefit B".
The useful correlations now show up when you assess the O's and B's at a later time. By then the project may have been abandoned (not necessarily a negative outcome in and of itself) or significantly transformed but it still counts as a data point. The general principle is "calling your shots". I am prepared to argue that to collect data in other ways is to manifest a "data fetish"; it looks serious, but that is only faking a scientific attitude. We should be interested in collecting data for the purposes of testing falsifiable hypotheses, at the practice level: what exactly does TDD do, for instance, and how does it compare in that to other approaches at a similar level. "Agile works" or "CMM works" or "RUP works" are too vague to be testable.
The D word (or, When Harry met Agile)
To finish, let's adress Constantines's final question: "Or does the agile community have the TRUE answers, regardless of the facts?" Ah, the accusation of dogma! No serious challenge to Agile would be complete without it.
It seems to me almost necessary that any successful community should sooner or later suffer such accusations. In fact, we should welcome them, as a useful reminder not to get too caught up in our own enthusiasm. However, it seems only fair to hold challengers to at least as high an epistemic standard as the community itself is showing. "Groupthink", for instance, is a technical term: it refers to something specific; to quote Wikipedia "a type of thought exhibited by group members who try to minimize conflict and reach consensus without critically testing, analyzing, and evaluating ideas".
I'm seeing plenty of conflict, critical testing, analysis, and evaluation going on in the Agile community. We reserve space for the opposition at our conferences, we invite critics to attend; in fact, they sometimes come away convinced. I'm thinking here of Alan Cooper, who attended in 2008 and later wrote "agilists and I are on the exact same wavelength." (Cooper) (Yes, this is a quote taken out of its context: I invite you to go read the full text at the URL in the references.) Alan continues to be one of the voices challenging agilists to do better.
I am, of course, not directing my next few remarks at Constantine, who like Alan is an "insider" skeptic, but rather to the "wannabe critics" I know from experience are out there. It is your responsibility to come forward with documented evidence of agilists' supposed dogmatism or groupthink; the mere voicing of the accusation does not constitue proof. It will cheaply establish your credentials as a skeptic with the outgroup relative to which Agile unfortunately tends to get defined - but that has no benefit other than to your ego, your inner Harry.
I was amused recently to read an entertaining blog rant criticizing Agile - "bashing" is perhaps more appropriate (Gwaredd). The blogger referred to a study of TDD by Nagappan of Microsoft (Nagappan), and made much of the fact that the data showed a 15-35% increase in development costs from TDD. "No way that's a good idea" seemed to be the conclusion. What the blogger conveniently failed to mention - at all - was that the selfsame study reported from 40% up to 90% reduction in defect density from the TDD teams. Ninety freaking percent! Silver bullet territory! This isn't to make too much of a single data point; I know that the plural of "anecdote" isn't "data". All I'm saying is, this strikes me as a prime example of selective reporting and generally low epistemic standards (even though the blog does make several good points). My bar for paying attention to critics is higher than that.
No, of course we don't have the One True answer.
Agile is not a destination: Agile is a journey of inquiry. We welcome challenges, we welcome critics and anyone who shares our passion to find out more about the fascinating puzzles of software development.
Where I fully agree with Larry Constantine is that we need more Real Data, and I have outlined above what I think fits that description. Let's go get it.
Notes:
(1) text somewhat elided, but I didn't want to break up the flow with the conventional [...] - see References for a link to the full text of this posting
(2) I am indebted for the PERT story to Godefroy Beauvallet, CFO at the French National Telecom Institute.
(3) That is, guard against at least these two. The list of possible biases affecting the validity of scientific findings is depressingly long.
References:
Real Data: Constantine, L. 2010. "Real Data".
http://tech.groups.yahoo.com/group/agile-usability/message/6751
CoP: Lave, Jean and Etienne Wenger. 1991. Situated learning: Legitimate peripheral participation. Cambridge University Press
Sapolsky: Sapolsky, Harvey M. 1972. The polaris system development; bureaucratic and programmatic sucessess in government. Harvard University Press, Cambridge
Cooper: Cooper, A. 2009. "My vision of agile".
http://www.cooper.com/journal/2009/07/my_vision_of_agile.html
Gwaredd: Mountain G. 2010. "Game development in a post Agile world."
http://gwaredd.blogspot.com/2010/02/game-development-in-post-agile-world.html
Nagappan: Nagappan N., Maximilien E.M., Bhat T. & Williams L. 2008. Realizing quality improvement through test driven development. Springer Verlag, also
http://research.microsoft.com/en-us/projects/esm/nagappan_tdd.pdf