Agile Base Practices Agility

Experiment to Improve

Agile Base Pattern: Adaptively Experiment for Improvement

Context: Plenty of data informs us. We can forecast when things will happen. Our progress metrics are aligned with long term goals. But externalities impede our progress: competitors emerge, delays harm us. We are passive victims of outside circumstance.

Reacting to events can be too late …

We suspect unknown dangers, economic loss, and growing ineffectiveness. Our friends reassure us, choosing their words carefully. Existing data is eerily stable. We aren’t learning anything new.


Complacency Lulls Us

When we observe passively, allowing ourselves to be buffeted about by circumstance, we may wait a long time to discover new approaches. Competition and disruption motivate us to learn and improve, but when nothing threatens us we enjoy our lead and relax. In times when our organization dominates competitors, we rarely innovate well, in part because nothing challenges us. People continue to buy regardless of what we do, because our brand is strong.

In dominant periods, we often devote more effort in operational (non-creative) work, because it is highly profitable. We decreasingly worry about competitive threats, we employ non-innovators to lead innovation teams, we report operational improvements. The world is rife with examples of complacent dinosaurs—Kodak, General Motors, Yahoo—who captured a lead, enjoyed a long period of dominance, and then, suddenly, disruptive competitors stole the market away.

In dominant periods, we can fail to examine measurements or revise our metrics as often as we do in unstable times. “We checked that metric last time, it was the same as before. We don’t need to measure as often.” By the time we notice a problem, we could be in trouble.

Operational work, the predictable stuff we can put on a checklist, is easy to do. But it can consume all our time, if we don’t limit it, leaving no time left for innovation and creativity. Manual tasks, customer calls, goalless meetings all interrupt creative focus (Amabile et al. 2002).

Loss of Control Scares Us

Risk of failure accompanies all innovation. Many people join dominant organizations because they value stability. They have learned to avoid situations with significant failure risk, and some overreact by avoiding all risky situations, even when failure cost is low. They prefer to allow others to take risks, and if disasters arise they can avoid blame.

“I’ve missed more than 9000 shots in my career. I’ve lost almost 300 games. 26 times, I’ve been trusted to take the game winning shot and missed. I’ve failed over and over and over again in my life. And that is why I succeed.” —Michael Jordan

Manufacturing gurus have told us that controlling variance leads to higher quality. But their definition of quality is predictability; when we try something new, it creates variance. When we follow their advice, innovation becomes harder and eventually stops (Ashkenas 2012).

We can Easily Measure Operational Work

In rote learning, our teachers teach us and test us, motivating improvement by comparing us to existing experts. We aren’t really being creative. But as we mature, we can increasingly invent our own future, doing things that have not been done exactly this way before. How do we measure these new things?

We can easily measure non-creative activities. Someone else has done it before, so we just compare ourselves to them.  Here’s a common metric people use: “How hard does it look like I’m working?” Unfortunately, for creative work, this can be a perverse metric. When we are working hard at non-creative work, we are making noise, sending emails, scheduling or attending meetings, sweating, looking worried. But these activities defocus us and degrade our creativity.

For creative activities, just finding decent metrics can be hard. We want to try new ideas, and we learn the most when about half our hypotheses fail (Reinertsen 2013). Should we measure our validation rate, rewarding ourselves for a hypothesis failure rate close to 50%? But validate what measured hypothesis, exactly? No single metric serves every application, and often no single metric serves throughout the term of a single project.

Uncertainty Confuses Us

Stephen Bungay argues three gaps interfere with planning in chaotic economies. A knowledge gap separates what we would like to know from what we actually know. An alignment gap separates what we want people to do and what they actually do. An effects gap separates what we expect our actions to achieve and what they actually achieve. These gaps mean that long-range plans are not likely to succeed without adaptation (Bungay 2011).

Despite these obvious planning uncertainties, we often think making more detailed plans produces better results. But over-planning consumes valuable effort and time, particularly when people demand details. Fear causes us to cling to the plan when unexpected things happen. And a detailed plan with confident dates and costs falsely reassures us.

… therefore, run improvement experiments.

If our environment isn’t challenging us to innovate effectively, we have to challenge ourselves. Our economic metrics help inspire innovative thinking. What new things can we do to help more people, more effectively, and, get more value from our interactions? How can we work in new ways to produce more rapidly, causing fewer problems, and enjoy our work and collaborations more?

We can challenge ourselves by running proactive “improvement experiments.” General scientists examine historical data, brainstorm and hypothesize, choose a hypothesis to test, run a controlled test to validate the hypothesis, compare resulting data with hypothesized outcomes and draw conclusions.

Improvement scientists do all those things, but focus on process changes, creative activities and prioritization to accelerate economic improvement. These improvement experiments test work hypotheses (such as “our lawyer can reliably deliver reviewed contracts within a week”), process theories (such as “pairing helps reduce rework and increase skill elasticity”) and market theories (such as “millennials respond well to ironic ads”) to create theories (hypotheses tested by experiments). In Scrum, for example, production cost experiments are constructed in the Retrospective (Derby 2006).

We need to construct new hypotheses from historical data. Before changing the goals and processes of creative work, brainstorm different options and explore the possible results in your economic model. How will the changes improve your existing economic progress measurements? If you don’t think they will affect progress measurements, but think you should make the change, maybe you are missing an important progress metric. Consider revising your mission and economic progress metrics. In Lean Startup, for example, new hypotheses are created when organizations pivot.

Experiments do not have to be ponderous affairs, with control groups, peer review and science boards. Shorter experiments can improve our understanding incrementally and shape future short experiments, directing exploration and study. These shorter experiments give us a little data with a lot of uncertainty, but that’s better than what we had before. In fact, between publications most scientists work this way, running little experiments to explore related ideas, before contemplating a blockbuster. Creative people and organizations operating this way can become enormously successful.

Experiments can be evolutionary or revolutionary. Evolutionary experiments test low-risk changes, and may validate the hypothesis after a short test. Revolutionary experiments test more radical changes, and benefit from thoughtful risk-mitigation strategies. In the history science, most experiments are evolutionary, punctuated with revolutionary upheaval when evolution stops working (Kuhn 2012).

Some changes are only possible through revolutionary change. Many believe that a department shifting from a waterfall process model (trust experts to plan and execute, deferring market risk to the very end) to an agile process model (test assumptions and reduce market risk frequently) requires revolutionary change. This explains why companies often hire “coach armies” to train many people all at once. One risk mitigation strategy (requiring a smaller, less expensive army) first converts one or two teams at each level in an organizational hierarchy to use agile management (sometimes called a “full stack approach”), then moves laterally through the organization.

Once we establish a hypothesis, we construct an experiment plan that will test the hypothesis. To keep our adaptation rate high, we should test our hypothesis with short iterative experiments. Evolutionary changes can be responsibly tested with just one or two iterations, while revolutionary changes could require more. However, even with revolutionary change, we should try to find results in the earliest iterations that help decide whether to proceed, change or abandon later experiments.

Assigning a person to facilitate brainstorming, establish consensus, manage the experiment and measure the results seems to improve the effectiveness of this approach, such as a ScrumMaster (Sutherland 2017). Many have found that this type of work distracts from creative production, and thus exempt this person from significant production work.

In teams, process adaptation helps balance perfect communication (expensive and slow) with unrestricted parallel activity (prone to mistakes caused by lack of coordination). I studied this phenomenon in parallel artificial intelligence applications (Greening 1995). Sometimes excellent communication generates better results, sometimes low-restriction parallel activity generates better results. We can’t identify those situations in advance, so we need to inspect and adapt.

One of the first agile techniques, Plan-Do-Study-Adjust (called PDSA or PDCA), was developed by physicist/statistician William A. Shewhart to control call quality in telephones. He argued that two sources create variability in measurements: assignable-causes and chance-causes. PDSA and its immediate successors—Six Sigma, Total Quality Management and Kaizen—focus on quality improvement. Their activities seek to reduce variability.

Innovation causes variability. Variability always accompanies chaos and complex adaptive systems. Low-risk variation can be caused by innovation and can result in significant corporate learning. When the economic metrics do not include metrics related to learning and elasticity, the organization is likely to control itself into stasis, adapting but not innovating as conditions change. Ultimately, this makes the organization fragile.

We can compensate for value and cost variation by adding metrics to our suite: elasticity and learning metrics balance value metrics; risk reduction metrics balance cost metrics. So when value or cost has high variation while learning and risk-reduction are high, it could be good; if value or cost has high variation, but learning and risk-reduction are low, we may have a problem.

Kaizen emphasizes small improvements. In a sense, PDSA and Kaizen are greedy algorithms, they always accept process changes that result in metric improvement and reject all others. In chaotic systems, like markets and product development, greedy algorithms can result in locally optimal results but drastically bad global results. This explains why we should not rule-out revolutionary experiments.

If we want optimal results, we must explore the effects of multiple large-scale changes along with limited small-scale changes. We must innovate at multiple scales, not just experiment with the small scale changes advocated by Kaizen.

Does this mean that we should allow wild, intuitive management to control our organizations? No. It means that we should measure and experiment, but not attempt to robotically interpret metrics thoughtlessly. Sometimes, we should run experiments longer than one iteration. Sometimes, we should explore how radical changes might produce large returns, and, to avoid significant economic damage, find a way to mitigate that risk while testing our big idea.



In Scrum Retrospective meetings, team members

  • assemble and discuss economic progress metrics from the past,
  • validate or invalidate the hypothesis of the last experiment (if there was one),
  • brainstorm and identify process changes for the next experiment,
  • hypothesize how the new process will affect the metrics, and
  • commit to the new process rules for the next experiment.

In short, Scrum Retrospective meetings set up an experiment. Because Scrum focuses on reducing production cost, its experiments are production experiments. During these Retrospective meetings, facilitation can greatly improve the results, and Scrum assigns one person that role, the ScrumMaster. Once the Retrospective meeting is complete, the ScrumMaster becomes an enforcer, looking for adherence to the experiment’s process rules, protecting the experiment from interference, gathering data and reporting it to the team. This activity makes for a more controlled experiment, and greater insight.

Lean Startup

Lean Startup also includes experimentation in its core prescription: “Build, Measure, Learn”.

The Lean Startup methodology has as a premise that every startup is a grand experiment that attempts to answer a question. The question is not “Can this product be built?” Instead, the questions are “Should this product be built?” and “Can we build a sustainable business around this set of products and services?” This experiment is more than just theoretical inquiry; it is a first product. If it is successful, it allows a manager to get started with his or her campaign: enlisting early adopters, adding employees to each further experiment or iteration, and eventually starting to build a product. By the time that product is ready to be distributed widely, it will already have established customers. It will have solved real problems and offer detailed specifications for what needs to be built (Reis 2015).

Notice here that Lean Startup is the converse of Scrum: it uses experimentation to test and discover features and markets that have significant value. Many organizations combine Scrum and Lean Startup to keep costs low (Scrum) and revenue high (Lean Startup).

Rhythmic Experimentation

Rhythm helps maintain the discipline of experimentation. Although some agile approaches incorporate rhythm, including Scrum and XP, many do not. Early in my agile career, I reinterpreted Scrum for large product organizations, naming it Enterprise Scrum, and applied it to Citrix System’s 700-person online engineering department (Greening 2010). We had enterprise-scale cost and value metrics, we managed our work for highest ROI, and initially we held quarterly retrospectives. The process stabilized and started showed a tenfold increase in release frequency (Greening 2013). But, after we missed two quarterly retrospectives in a row and we lost our agilist engineering Vice President, unhappiness and dysfunction grew in the department, ultimately resulting in the re-introduction of many waterfall management techniques and a slowdown in feature development. I suspect a rhythmic retrospective, that incorporated the CEO, would have avoided this problem, allowing people to discuss and solve problems through process modification.

Resulting Context

Teams that proactive experiment to improve can become hyperproductive, increasing their productivity and quality by factors of 2 or more (Jakobsen 2009, Jakobsen 2015). This could seem miraculous, but when observing teams, you’ll find that creative people spend most of their time communicating and learning new techniques (as well as blowing off steam). Many people use inefficient techniques to complete work, but traditional operational management has no reasonable way to monitor and improve this, and can destroy creativity. When we shine an experimental lens on our creative, collaborative work, for the first time, we can improve much faster.


Amabile, T., Hadley, C. N., & Kramer, S. J. (2002, August). Creativity Under the Gun. Harvard Business Review. Retrieved from
Ashkenas, R. (2012, May 8). It’s Time to Rethink Continuous Improvement. Retrieved March 28, 2018, from
Bungay, S. (2011). The Art of Action: How Leaders Close the Gaps between Plans, Actions, and Results. Nicholas Brealey Publishing.
Derby, E., Larsen, D., & Schwaber, K. (2006). Agile Retrospectives: Making Good Teams Great. Pragmatic Bookshelf.
Greening – 2013 – Release Duration and Enterprise Agility.pdf. (n.d.).
Greening, D. (1995, January). Simulated Annealing with Errors (PhD). UCLA, Los Angeles. Retrieved from
Greening, D. R. (2010). Enterprise Scrum: Scaling Scrum to the Executive Level. Presented at the Proceedings of the 43rd Hawaii International Conference on System Sciences, Honolulu: IEEE.
Greening, D. R. (2013). Release Duration and Enterprise Agility. In 46th Hawaii International Conference on System Sciences (pp. 4835–4841). IEEE.
Jaffe, E. (2013). Why Wait? The Science Behind Procrastination. APS Observer, 26(4). Retrieved from
Jakobsen, C. R. (2015, May 4). Productivity results from Systematic Corporation [Personal Conversation].
Jakobsen, C. R., & Sutherland, J. (2009). Scrum and CMMI Going from Good to Great. In 2009 Agile Conference (pp. 333–337). IEEE.
Kuhn, T. S. (2012). The Structure of Scientific Revolutions: 50th Anniversary Edition. University of Chicago Press.
Reinertsen, D. (2013, February 1). The Four Impostors: Success, Failure, Knowledge Creation, and Learning. Retrieved March 26, 2018, from
Reis, E. (2017). The Lean Startup | Methodology. Retrieved March 28, 2018, from
Sutherland, J., & Schwaber, K. (2017, November). Scrum Guide. Retrieved March 28, 2018, from
Thomke, S. (2001). Enlightened Experimentation: The New Imperative for Innovation. Harvard Business Review, 66–75.


Related Work

The five Agile Base Patterns are described in detail at Senex Rex. See Measure Economic ProgressProactively Experiment to Improve, Limit Work in ProcessEmbrace Collective Responsibility and Collaborate to Solve Systemic Problems. Subsequent posts will explore patterns beyond these basics. Subscribe below to be notified when new posts go live.


Maxim KurEvelyn Crofts, Erik Gibson and Oscar Amelunge provided reviews of an early draft. Dan Greening is solely responsible for errors in this work.


Dan Greening

Senex Rex tackles challenging problems at the agile management frontier. We help companies, teams and leaders accelerate and sustain value and quality production, so we can all live better lives. Contact us for help with agile transformation, turnaround management or performance improvement.

By Dan Greening

Dan Greening is a serial entrepreneur working on his fourth startup, where he leads implementation of two agile practices, Lean Startup and Scrum. Between the third and fourth startup, he was the lead agile coach for Citrix Online, Skype, Overstock, and other companies. He holds a Ph.D. in Computer Science from UCLA. He is a Certified Enterprise Coach with the Scrum Alliance, and a Scrum@Scale Trainer. He has published innovative work on agile management, parallel processing, and chaotic systems.

7 replies on “Experiment to Improve”

@”Retrospective Meetings should be the first meeting in Scrum, not the last.”

I agree fully. I call them ‘Prespectives’.

In a Retrospective you can see what you shouldn’t have done, but the time is already wasted.

In a Prespective you can see what you are going to do that you shouldn’t do, and you still can refrain from spending the time.

It’s so easy to save a lot of time.

Niels…agree with you and like your terminology “Perspective” as outcomes discussed during retrospective turns to be really fruitful inputs to next iteration…and in this way getting real benefit by utilising the glad items and working over the sad items during proceeding iterations leading to continuous improvement…(Y)

@Jagrati: Note I used “Prespective”, not “Perspective”, but that may even be caused by your computer or LinkedIn ‘correcting’ the spelling of the word that strangely enough doesn’t exist, until I made it up. Have a look at for an other important word I had to create: ‘preflection’.

Only if we change our way of working, the result may be different
(-)Hindsight is easy, but reactive
(-)Foresight is less easy, but proactive
(-)Reflection is for hindsight and learning
(-)Preflection is for foresight and prevention

Only with prevention we can save precious time. When thinking along these lines, I thought: That’s what the Deming/Plan-Do-Check-Act cycle is about.

This is in line with my understanding of Evolutionary Strategies for complex adaptive systems (e.g. organisms like humans) to cope with complex dynamic environments.

Yep. And the cool thing is almost agile methodologies have a placeholder for it. But unfortunately most Retrospectives don’t come from a “science team” perspective, and thus don’t really do this. I would argue those teams are not agile.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.