Thursday, 18 August 2016

Post-merge test automation failures

Recently we implemented selenium grid for one of our automated suites. I've written about our reasons for this change, but in short we wanted to improve the speed and stability of our automation. Happily we've seen both those benefits.

We've also seen a noticeable jump in the number of pull requests that are successfully merged back to our master branch each day. This gives some weight to the idea that our rate of application code change was previously impeded by our test infrastructure.

The increase in volume occasionally causes a problem when two feature branches are merged back to master in quick succession. Our tests fail on the second build of the master branch post-merge.

To illustrate, imagine that there are two open pull requests for two feature branches: orange and purple. We can trigger multiple pull request (PR) builds in parallel, so the two delivery teams who are behind these feature branches can receive feedback about their code simultaneously.

When a PR build passes successfully and the code has been through peer review, it can be merged back to the master branch. Each time the master branch changes it triggers the same test suite that executes for a pull request.

We do not trigger multiple builds against master in parallel. If two pull requests are merged in quick succession the first will build immediately and the second will trigger a build that waits for the first to complete before executing. Sometimes the second build will fail.

1. Failing tests after multiple PR merges to master

As the person who had driven sweeping test infrastructure changes, when this happened the first time I assumed that the test automation was somehow faulty. The real issue was that the code changes in orange and purple, while not in conflict with each other at a source code level, caused unexpected problems when put together. The failing tests reflected this.

We hadn't seen this problem previously because our pull requests were rarely merged in such quick succession. They were widely spaced, which meant that when the developer pulled from master to their branch at the beginning of the merge process these type of failures were discovered and resolved.

I raised this as a topic of conversation during Lean Coffee at CAST2016 to find out how other teams move quickly with continuous integration. Those present offered up some possible options to resolve the problem as I described it.

Trunk based development

Google and Facebook move a lot faster than my organisation. Someone suggested that I research these companies to learn about their branching and merging strategy.

I duly found Google's vs Facebook's Trunk Based Development by Paul Hammant and was slightly surprised to see a relevant visualisation at the very top of the article:


2. Google's vs Facebook's Trunk Based Development by Paul Hammant

It seems that, to move very quickly with a large number of people contributing to a code base, trunk-based development is preferred. As the previous diagram illustrates, we currently use a mainline approach with feature branches. This creates larger opportunities for conflicts due to merging.

I had assumed that all possible solutions to these tests failing on master would be a testing-focused. However, a switch to trunk-based development would be a significant change to our practices for every person writing code. I think this solution is too big for the problem.

Sequential build

Someone else suggested that perhaps we were just going faster than we should be. If we weren't running any build requests in parallel and instead triggered everything sequentially, would there still be a problem?

I don't think that switching to sequential builds would fix our issue as the step to trigger the merge is a manual one. A pull request might have successfully passed tests but be waiting on peer review from other developers. In the event that no changes are required by reviewers, the pull request could be merged to master at a time that still creates conflict:

3. Sequential PR build with rapid merge timing

The pull request build being sequential would slow our feedback loop to the delivery teams with no certain benefit.

Staged Build

Another suggestion was to look at introducing an interim step to our branching strategy. Instead of feature branches to master, we'd have a staging zone that might work something like this:

4. Introducing a staging area

The staging branch would use sequential builds. If a test passes there, then it can go to master. If a test fails there, then it doesn't go to master. The theory is that master is always passing.

Where this solution gets a little vague is how the staging branch might automatically rollback a merge. I'm not sure whether it's possible to automatically back changes off a branch based on a test result from continuous integration. If this were possible, why wouldn't we just do this with master instead of introducing an interim step?

I'm relatively sure that the person who suggested this hadn't seen such an approach work in practice.

Do Nothing

After querying the cost of the problem that we're experiencing, the last suggestion that I received was to do nothing. This is the easiest suggestion to implement but one that I find challenging. It feels like I'm leaving a problem unresolved.

However, I know that the build can't always pass successfully. Test automation that is meaningful should fail sometimes and provide information about potential problems in the software. I'm coming to terms with the idea that perhaps the failures we see post-merge are valuable, even though they have become more prevalent since we picked up our pace.

While frustrating, the failures are revealing dependencies between teams that might have been hidden. They also encourage collaboration as people from across the product work together on rapid solutions once the master branch is broken.

While I still feel like there must be a better way, for now it's likely that we will do nothing.



Other posts from CAST2016:

Friday, 12 August 2016

Human centered test automation

The opening keynote at CAST2016 was Nicholas Carr. Though his talk was live streamed, unfortunately a recording is not going to be published. If you missed it, much of the content is available in the material of a talk he delivered last June titled "Media takes command".

Nicholas spoke about typology of automation, the substitution myth, automation complacency, automation bias and the automation paradox. His material focused on the application of technology in traditionally non-technical industries e.g. farming, architecture, personal training.

As he spoke, I started to wonder about the use of automation within software development itself. Specifically, as a tester, I thought about the execution of test automation to determine whether a product is ready to release.

Automation providing answers

Nicholas shared an academic study of a group of young students who were learning about words that are opposite in meaning e.g. hot and cold. The group of students were divided in two. Half of the students received flashcards to study that stated a word with the first letter of it's opposite e.g. hot and c. The other half of the students received flashcards that stated both words in their entirety e.g. hot and cold.

The students who were in the first group performed better in their exam than those in the second group. Academics concluded that this was because when we need to generate an answer rather than simply study an answer, then we are more likely to learn it. This phenomenon is labelled the generation effect.

On the flip side, the degeneration effect is where the answers are simply provided, as in many automated solutions. Nicholas stated that this approach is "a great way to stop humans from developing rich talents".

It's interesting to consider which of these effects are most prevalent in processing the results provided by our continuous integration builds. I believe that the intent of build results is to provide an answer: the build will pass or fail. However, I think the reality of the result is that it can rarely be taken at face value.

I like to confirm that a successful build has truly succeeded by checking the execution time and number of tests that were run. When a build fails, there is a lot of investigative work to determine the real root cause. I dig through test results, log files and screenshots.

I have previously thought that this work was annoying, but in the context of the degeneration effect perhaps the need to manually determine an answer is how we continue to learn about our system. If continuous integration were truly hands-off, what would we lose?

Developing human expertise

Nicholas also introduced the idea of human centered automation. This is a set of five principles by which we see the benefits of automation but continue to develop rich human expertise.

  1. Automate after mastery
  2. Transfer control between computer and operator
  3. Allow a professional to assess the situation before providing algorithmic assistance
  4. Don't hide feedback
  5. Allow friction for learning

This list is interesting to consider in the context of test automation. The purpose of having test automation is to get fast feedback, so I think it meets the fourth point listed above. But against every other criteria, I start to question our approach.

When I think about the test automation for our products, there is one suite in particular that has been developed over the past five years. This suite has existed longer than most of our testers have been working within my organisation. There is logic coded into the suite for which we no longer have a depth of human expertise. So we do not automate after mastery.

The suite executes without human interaction, there is no transfer of control between computer and operator. Having no tester involvement is a goal for us. Part of providing rapid feedback is reliably provide results within a set period of time regardless of whether there is a tester available.

The suite provides a result. There is human work to assess the result, as I describe above, but the suite has already provided algorithmic assistance that will bias our investigation. It has decided whether the build has passed or a failed.

Finally, the suite is relatively reliable. When the tests usually pass, there is no friction for learning. When the tests are failing and flaky, that is when testers have the opportunity to deeply understand a particular area of the application and associated test code. This happens, but ideally not very much.

So, what are the long term impacts of test automation on our testing skills? Are we forfeiting opportunities to develop rich human expertise in testing by prioritising fast, reliable, automated test execution?

I plan to think more about this topic and perhaps experiment with ways to make our automation more human centered. I'd be curious to hear if other organisations are already exploring in this area.



Other posts from CAST2016:

Thursday, 11 August 2016

Fostering professional development

One of the sessions that I attended at CAST2016 was titled "How do I reach the congregation when I'm preaching to the choir?" presented by Rob Bowyer and Erik Davis. One of the main themes of discussion focused on whether people should "sell" professional development to their colleagues or team.

In the introduction to the session, Rob and Erik spoke a little bit about their own contexts. They shared some of the challenges that they've encountered in trying to foster a culture of professional development in both their organisations and their local testing community.

Two particular challenges stuck out for me and I noted them down. Firstly, that "most of the people didn't care" about professional development. Secondly, that "I've been struggling to get people to see the value" in professional development. It struck me that these two challenges in creating a culture of learning could be related.

Do I see value?

I have an ISTQB Foundation certificate. I did this early in my career because I believed that getting this qualification was necessary to find employment in the software testing field. I could see the certificate being mentioned in a lot of job advertisements for testers. 

I saw a clear benefit to me in downloading the syllabus, doing some independent study and taking an exam. This activity was going to open up opportunities in a field of work that I might otherwise be unable to enter. I wanted to be a tester, so I wanted to get the certification.

At that time, I saw the value in this professional development for my career.

On the other hand, I have never completed the BBST Foundation course. I have heard a lot about this qualification and investigated the material that is available online. I have advocated for people in my team to attend this course and published the business benefits I used to argue for this opportunity. But I have not completed the course personally.

I did not learn about BBST Foundation until I had reached a point where I had learned many, but not all, of the concepts in the course via other means. I had heard a lot about the time investment required to complete the course successfully. When given the opportunity to take the class, I decided not to.

At that time, I did not see the value in this professional development for my career.

Do I care?

In the case of ISTQB, a manager might have assumed by my actions that I cared about my professional development. In the case of BBST, a manager might have assumed by my actions that I did not care about my professional development. Both conclusions are reached by assumptions, which are present in any communication.

The Satir Interaction Model describes what happens inside of us as we communicate - the process we go through as we take in information, interpret it, and decide how to respond [Ref].

Ref: "I think we have an issue" -- Delivering unwelcome messages
Fiona Charles

The steps in the Satir Interaction Model between intake and response are hidden. This means that the end result of the process that assigns meaning and significance can be quite surprising to the recipient, which can be a catalyst for conflict.

For example, imagine that I give a manager an input of "I do not want to take the BBST Foundation course". I would be surprised by a response from that manager expressing disappointment that I don't care about my professional development.

We can also climb a Ladder of Inference in our interactions, which refers the idea that there's "a common mental pathway of increasing abstraction, often leading to misguided beliefs". In essence, this is about leaping to conclusions.

For example, imagine the same manager who received my negative response to the BBST Foundation course receiving a promotional email for the RST course. They might extrapolate from my previous negative response that I will not want to attend RST, that I don't want to take any training courses, and that it would be a waste of time to forward me an email that describes this opportunity. I haven't had any input into this flow of reasoning. The manager has independently climbed a ladder of inference.

I think we need to be aware of both of these communication models when assessing an apparent lack of interest in professional development - particularly when we're labeling what we see as "most of the people didn't care".

Empathy & Understanding

Let's return to the question of whether there is a need to sell professional development. I don't think so. However, I agree with an alternative phrasing suggested in the session: that we should foster professional development.

When I sell, I am trying to be persuasive and articulate the merits of an activity. My communication is broadcast oriented. I want to share my reasoning and rationale. I try to explain why people should participate. My intent is to advertise.

That "I've been struggling to get people to see the value" is a failure to sell.

When I foster, I am seeking to encourage the development of an activity through understanding the obstacles that prevent it from happening. I want to be mindful of the ladder of inference and the judgments that I am applying to the responses of my colleagues and team. I want to be aware of where I've assigned significance and meaning might have distorted the message that I have been given, particularly when people are saying "no" to an opportunity.

That "most of the people didn't care" is a failure to foster.

I believe that there are relatively few people who truly don't care about their professional development. If there are people around you who you would label in this manner, I'd challenge you to think about how you have communicated and the responses that you've received.

What have they actually said? What meaning have you prescribed? Have you really understood?

I believe that in reflection and inquiry there is opportunity to successfully foster professional development.



Other posts from CAST2016:

Thursday, 14 July 2016

Test-Infected Developers

This article was originally published in the June edition of Testing Trapeze

At my workplace there is a culture of shared ownership in software delivery. We develop our products in cross-functional agile teams who work together to achieve a common business goal. However it’s still relatively rare for specialists to be proactive about picking up work in areas outside of their own discipline. For example, you don’t often see business analysts seeking out test execution tasks and prioritising those above work to refine stories in the product backlog.

That said, I’ve recently noticed an increase in the number of developers who are voluntarily engaging in test-related activities. They’re not jumping forward to think about test planning or getting excited about exploring the application. But they are diving into our automation by helping the testers to improve the coverage it provides, or working to enhance the framework on which our tests run.

As a coach part of my role is to foster cross-discipline collaboration. I confess that I haven’t been putting any active focus on the relationships between developers and testers. It is something that has changed as a byproduct of other activities that I’ve been part of. I’ve been reflecting on what’s behind this shift and the reasons why I believe the developers are getting more involved.

Better Test Code

In the past our test code has occupied a dark corner of our collective psyche. Everyone knows that it is there, but most people don’t want to engage with it directly. In particular, I have felt that developers were reluctant to get involved in a code base that was littered with poor coding practices and questionable implementation decisions. In instances where a developer did contribute, it was often a cause of frustration rather than satisfaction.

The test team have recently undertaken a lot of work to improve the quality of code that runs our automation. In one product we started the year with a major refactoring exercise that allowed us to run tests in parallel, which more than halved our execution time. In another we’ve launched a brand new suite with agreed coding standards from the beginning.

The experience for a developer who opens our automation is a lot less jarring than perhaps it has been in the past. As the skills of the testers improve, and the approach that we take to writing code becomes closely aligned with the way that developers are used to working, it’s no longer traumatic for a developer to delve into the test suites.

In addition, all of the changes to the test code now go through the same peer review process as the application code. We use pull requests to facilitate discussion on new code. There is a level of expectation: it’s not “just test code”. We want to write automation that is as maintainable as our application.

The developers have started to participate more in peer review of test code. There’s a two-way exchange of information in asking a developer to review the automation. The tester gains a lot of instruction on their coding practices. However the developer will also gain by having to completely understand the test coverage in order to offer their feedback on it.

Imperfect Test Framework

On the flip side of the previous point, there are still a number of very clear opportunities for enhancing our automation frameworks and extending the coverage that they offer. The testers don’t always have the capacity, skills or inclination to undertake this work.

I can think of a few occasions where a developer has been hooked into the test automation by an interesting problem in the supporting structure that needed a solution. Specific technical jobs like setting up an automated script for post-release database changes or tweaking configuration in the continuous integration builds. These tasks improve their understanding of the framework and may mean that the developer ends up contributing to the test code too.

Within the tests, there are application behaviours that are challenging to check automatically. Particularly in our JavaScript-heavy applications we often have to wait for different aspects of the screen to update during a sequence of user actions. Developers who contribute by writing the helper methods required for testing in these areas will often end up having a deeper understanding and closer involvement in all of the associated test code.

I believe the key here is providing specific tasks where the developers can engage in the test code with a clear purpose and feel a sense of accomplishment at their conclusion. In some instances, the developer will complete a single task then withdraw from the testing realm. In others, it’s a first step towards a deeper involvement in the test code and subsequently testing.

Embedded In Development

In almost every instance, a developer who is making a change to one of our applications will need raise a pull request to have their code merged back to our master branch for release. As part of the process enforced by our tools, the code to be merged must have passed all of our automated checks. Every change. All of the automation.

We’ve always run our automation regularly, but its relatively recent that it has it become mandated on every merge. This change has largely been driven by the developers themselves who are keen to improve the quality of code prior to testing the integrated code base.

Now that our automation runs many times per day it is in the best interests of the developers to be engaged in improving the framework. If it is unreliable or the tests are slow to execute, it has an immediate negative impact on the developers as they are unable to deliver changes to our applications. They want our automation to be robust and speedy.

The new build schedule has helped to flush out pain points in the test code and engaged a wider audience in fixing the root causes of these issues by necessity. Now most of the developers have experienced a failing build and had to personally debug one or more of the tests to fix the problem. The developers are actively monitoring test results and analysing failures, which means that they are a lot more familiar with the test code.

Conclusion

I see automation as a gateway to getting developers engaged in testing more broadly. When collaborating on coverage in automation, there is the opportunity to discuss the testing that will occur outside of the coded checks. The conversation about what to automate vs. what to explore is a valuable one for both disciplines to engage in.

We’ve taken three steps down the path to having our developers excited about picking up tasks in our test automation. We’ve made the suites a pleasant place to spend time by applying coding standards and ensuring that changes are peer reviewed. We’ve provided opportunities for developers to contribute to the framework or helper methods instead of asking them to write the tests themselves. And we’ve the automation in the development process to create a vested interest in rapid and reliable test execution.

Developers can become test-infected. I am seeing evidence of this in the collaborative environment that is continuing to evolve in my organisation.

Monday, 11 July 2016

A community discussion

A while back I put out a tweet request:


I spoke about the responses to this tweet during my talk titled "A Community Discussion" at Copenhagen Context. Somewhat ironically I've been reluctant to share the feedback that I received in writing. There's been exchanges in the testing community recently that makes me feel now is the time.

I had a lot of responses to my original request on Twitter. About half tried to explain context-driven testing rather than the community. Those who did speak about the people and environment gave responses like:
  • A bunch of supportive, challenging and engaged people full of questions, support and understanding.
  • Warm and welcoming, literally the best thing that I've come across in my career.
  • People who insist on a human perspective on testing
  • A community of people who constantly asks the question how can we be (test) better?
  • A group of people not restricted by a so called set of best practices and a one size fits all approach
  • A world-wide support network of people who share the same fundamental principles as me

I also had a lot of responses via private channels. Direct messages, email and skype. In many instances they were from people who no longer felt that they were part of the community. They gave responses like:
  • The Cult/Church of CDT due to the rhetoric used by CDT to describe their heroic and righteous fight against evil
  • The Test Police because they feel the need to correct the terminology and thinking of everyone else regardless of whether they share the same world-view.
  • They are an academic think-tank that is out of step with modern business needs
  • CDT is RST, it’s all just RST stuff, RST is the new best practice
  • If you don’t beat your drum to the CDT Rhythm they’ll beat you down hard
  • The Anti-ISO group, The Anti-ISTQB people, the Anti-anyone not CDT people etc. 
  • Not a safe place to share and explore

Are you surprised by this?

I was surprised by the stark polarity in what was shared openly and what was shared privately. I was surprised by who responded and who chose not to. I was surprised by specific individuals who held different opinions to what I had expected. However, I wasn't surprised to see these two views emerge.

What bothers me is that these two viewpoints seem to be a taboo topic to have a conversation about. 

On Twitter there has been activity that feels like warfare. Grenades are launched from both sides, loud voices shout at one another, misunderstandings create friendly fire, and when the smoke clears no one is sure what the outcome was. 

What I wanted to do in my talk at Copenhagen Context was start a dialog. I talked about an inclusive context-driven testing community by sharing the model I created almost two years ago. I suggested some ways in which we could alter our behaviour. I was part of an Open Season discussion where those present shared their views. 

Since then?

I continue to focus on making the New Zealand testing community as inclusive as possible. I believe that WeTestTesting Trapeze and even this blog are making a difference in spreading the ideas from the context-driven school without the labels. I strive to be approachable, humble and open to questions.

I hope that I am setting an example as someone making a positive difference through action. My personal role model in this space is Rosie Sherry, who is the "Boss Boss" at Ministry of Testing. I observe that she has her own style of quiet leadership and a practical approach to change.

But the wider conversation is still adversarial or hidden. I'd like to see that change.

What are your thoughts?

T-Shirt print from Made in Production

Sunday, 3 July 2016

Why we're switching to Selenium Grid

The department that I am part of has gone through a big growth spurt recently. When I started in my role, just over a year ago, there were 20 testers. Now there are 30. That jump is indicative of what has happened in all disciplines of software delivery.

This growth is starting to create some interesting problems in the execution of our test automation. In particular for our web-based retail banking application, which is a relatively young product that has had test automation embedded in the development approach since the very beginning.

Alongside a comprehensive unit test suite, we've been using Selenium WebDriver to execute tests against Firefox. We call these tests our "automated acceptance suite" (AAS) or "node tests", which is a reference to the mock server technology that these tests execute against.

In the beginning the application was small and the node tests that ran alongside it were quick. As the product has grown we've added more tests, so they take longer to execute. When the fast feedback provided by our automation was no longer fast enough, we switched our tests from single thread to parallel execution.

In the beginning there was just a single development team and the node tests ran every time that a change was made. As the number of teams has grown the number of changes being made has increased, so the tests are being executed more frequently. When our build queues started to exceed reasonable lengths, we switched from a dedicated continuous integration hardware to docker containers that increased the number of builds we could execute in parallel.

Our solution to problems introduced by growth has been to do more things at once.

To get the tests to run faster we switched the test implementation to parallel execution.

To get the build queues to be shorter we switched the infrastructure to parallel execution.

These were good solutions for us. But now we're coming to the point where we can't do any more things at once with what we have. To illustrate, compare what was running on our build server against what is running there now:


In the beginning we had dedicated hardware. It ran a node server to return mock responses, a web server for our product, and the tests that opened a single Firefox window to execute against.

In our current state we have four active docker containers. Each runs a node server, a web server, and the tests that open four Firefox windows to execute against.

In our current state we're hitting the limits of what our infrastructure can do. This is manifesting in two types of problem that are causing a lot of frustration as they fundamentally impact on two key measures for the usefulness of automation: speed and stability.

Our current state can be slow, particularly when there are four builds executing at once and the hardware is fully loaded. Our overnight build time is approximately 30 minutes. By contrast, when a build executes during business hours it takes approximately 50 minutes.

I find it easiest to explain why this happens using an analogy. Imagine a horse towing a cart with four large pumpkins in it. The horse can trot down the street quite happily, relatively unencumbered by its load. Now imagine the same horse towing a cart with 28 large pumpkins in it. The horse can still move the cart, but it won't be able to travel at the same pace that it did with a lighter load. It may trudge rather than trot.

Our overnight build is carried by the lightly loaded horse as it may be the only build active on our hardware. Our build during business hours is carried by the heavily-laden horse as many builds run at once. The time taken to complete a build alters accordingly.

The instability we've seen comes partly from this variable speed. There's a particular case where we look for a success notification that is only displayed for a fixed duration. When the timing to complete the action that triggers this notification is variable, it becomes frustrating to verify.

But we've also had stability problems with the four Firefox browsers running on a single display. Some failures are caused by tests running in parallel that fight for focus e.g. attempting to confirm a payment via a modal dialog. Others are attributed to two different tests that simultaneously attempt to hover and click the mouse e.g. editing an account image. When these clashes occur, one of the tests involved will usually fail.

Our operations team ran some diagnostics on the existing hardware to determine what made it slow. They identified which processes were chewing up the most system resources or the largest pumpkins on the cart. It turned out that there was a clear single culprit: Firefox.

Enter Selenium Grid.

Selenium Grid enables a distributed test execution environment. What this means in our case is that we can move all of the Firefox instances out of our docker containers. This will significantly lighten the load on our existing continuous integration infrastructure:



In the proposed future state, our tests will trigger to the Selenium Grid Hub on our cloud-based infrastructure. The hub will have connectivity to a pool of Selenium Grid Nodes. Instead of having multiple Firefox windows open on a single display, we're provisioning each node in a dedicated container with a single browser.

Each grid node will know where it was triggered from, as the browser will still open the web application that is running on the existing docker architecture. This does mean that we are introducing network latency into each of our WebDriver interactions, so they'll be slower than on local hardware. But the distributed architecture should give us enough advantages that we still end up with a faster solution overall.

Our hope is that this proposed future will address our existing speed and stability issues. Increasing the system resource available through the introduction of hardware should help us to get consistent build times, regardless of the time of day. And having each Firefox browser in its own dedicated container should avoid any display contention.

We have a working prototype for the proposed future state and early signs are promising. I'm looking forward to turning the vision into reality and hope that it will bring benefits that we are searching for.

Thursday, 23 June 2016

Launch Wrangling

Imagine being one of five testers in an organisation with over 400 developers. Picture a pace of 1,251 production deploys in a week. Now throw in a distributed workforce that communicates almost exclusively online.

Last night I attended Sheri Bigelow's talk at WeTest. Sheri works as a tester at Automattic in what I believe to be quite a unique environment. She shared some fascinating insights into building a testing culture in continuous delivery, in a team where testers are vastly outnumbered and testing is an optional activity.

Of all the things that Sheri spoke about there was one in particular that resonated. It's something that I can imagine applying in my own organisation, despite our vastly different contexts in developer:tester ratio, rate of release, and risk profile.

Launch Wrangling

Many people who develop software consider a release to be the end of their process. The idea that once a feature is in production, being used by the hands of customers, the development team can move on to their next piece of work.

When Sheri talked about deploying to production she said that it's "not the end of the game, it's kind of the middle". At Automattic the developers have work to do beyond creating the code itself. They are expected to monitor their changes in production and help to provide support to users when required. A true DevOps culture.

In the middle of a sports game, the players will usually take a half-time break. They'll have some refreshments, reflect on the game so far, take inspiration from their coach, then return to the competition.

In teams where deploying to production is halfway in the process there'll be a similar lull in activity. That little bit of time between something being finished and being released is an opportunity for refreshment, reflection, then a return to action. A half-time break.

In this relatively empty time, Sheri saw an opportunity. She started asking delivery teams to use the space around their releases to participate in, essentially, a bug bash. People put aside their day-to-day duties and those from every type of role worked together for a short period of time to test the product.

At Automattic this activity is called launch wrangling. Apparently when your Company Founder is from Texas there's a strong cowboy influence in naming things!

Sheri has used launch wrangling as an opportunity to introduce testing as an activity to developers who may not have tested before. She also talked about getting a lot of eyes across the application to improve the chances for important problems to be discovered prior to release. This means that launch wrangling is both a coaching tool and a way to improve test coverage.

No matter what type of delivery schedule you adopt, breathing space around deployments is likely to exist in some capacity. In my experience the amount of time available will correlate to the size of the changes being made to production. Big changes create a bigger pause. Utilising this gap seems like a sensible way to appropriately time-box a bug bash activity.

I like the idea of launch wrangling to foster testing across disciplines and improve the scope of testing where resources are particularly limited.