Sunday, 23 February 2014

Non-functional testing in continuous delivery

Last year I worked in an organisation that did twice-daily production releases with very little non-functional testing. Often the litmus test of whether a release met non-functional requirements was to push it out to users and closely monitor the results. There was an amazing operations team who had a rapid automated production rollback procedure, which could clean things up in a flash, if required.

On joining this organisation as a tester I was astounded by their approach. On my first day they had a bad release go to production and roll back off the live platform. I felt that I had entered the Wild West of software development; it appeared reckless.

As time went on I saw the process repeated, there were more good releases than bad, and things started to make a certain amount of sense. In a twice-daily release cycle there isn't time to spend on detailed performance testing, usability sessions, or security audits. Though I became more comfortable, I had a nagging doubt that this was not the way that others were solving this problem.

At CITCON this weekend I had the opportunity to find out, by facilitating a session titled "How do you incorporate non-functional testing in continuous delivery?".

The first thing that struck me, after sharing what I just described, was that no-one jumped to volunteer a better solution. People started to talk around the topic, but not to it. I had to repeat my question several times before one attendee said "We focus on functionality and kind of ignore non-functional testing". I felt this statement was reflective for many in the group.

Someone proposed that the first problem in incorporating non-functional testing is a lack of written non-functional requirements. People can quickly determine whether something is not working by means of it being too slow, or difficult to use, or succumbing to malicious infiltration. Defining what is expected from the application for performance, usability, and security, is much more difficult. The rapid pace of continuous delivery, coupled with a relatively robust process for testing in production, creates a compelling excuse not to stop and think about non-functional requirements.

In the case that requirements are present, how do testers find time to test them? General consensus was that the requirements would form the basis for a suite of discrete automated checks designed to alert the tester; a prompt to hold the release while the tester investigated the problem. Pre-release non-functional testing would be driven by a failing check.

In the case of performance, the check may fire when a threshold is exceeded, or highlight a marked degradation that still falls within the threshold e.g. if the page load time jumps from 0.3s to 2s, and the threshold is set at 4s, we would still want to know about this change. Some in the audience had already implemented lightweight, targeted, automated performance checks that were running in their continuous integration environment.

As the conversation turned to security there was doubt that the same principle could be applied. However one tester in the audience was doing just this by using the results of security audits to create scripted security checks. Though vigilance is required to keep up with evolving security threats, he felt that the maintenance overhead was no different to any other automated test suite.

Finally we spoke about usability. The first thought from the audience was that perhaps A/B testing is how most companies achieve this in a continuous delivery environment. Those assembled were familiar with the concept as New Zealand is often used as the trial region for new Facebook features. Some used this approach, though others argued that if your focus is user loyalty or sales you may not want to risk alienating a proportion of your clientele by giving them a weaker design.

Interestingly, there were those who thought that the same principle of checks may even work for usability. In particular, the accessibility aspects that often require that the application can be used by a machine. Tools to check for tab order, alternate text in images, appropriate colour and contrast, and valid HTML were all mentioned.

The session finished with a conversation about whether this would really work. The arguments against seem to be invalidated by the type of organisations that choose continuous delivery. Organisations that make frequent releases a priority and pride themselves on responsiveness must acknowledge that this comes at the expense of quality. It's fine if a user sees something that isn't quite right, so long as its only briefly. I found it interesting that those with real-world experience in continuous delivery often worked in an iconic or monopolistic organisation where the user has strong brand loyalty and little choice.

Are you using continuous delivery? How do you incorporate non-functional testing?


  1. An interesting article Katrina, a colleague of mine forwarded this on to me and I found myself agreeing with many of your sentiments. The people who can get away with a cavalier attitude to testing tend to be those whose customers have no simple alternative. For example users of FaceBook, Gmail and so on (brand loyalty/lack of choice).

    Where customers do have a real choice, such as on retail websites, the unstructured approach to testing is more likely to cause problems.

    Thanks for sharing your thoughts.

  2. Hi Katrina

    I always encourage companies to think about operational testing, rather than non-functional testing. Names matter, and when you call something "non-functional" I fear product owners hear "not as important as the features I want". It's an education exercise - you have to teach, and even show product owners and other stakeholders how latency, throughput, and security concerns can cripple their product/service.

    For operational concerns such as latency, throughput, and security I create acceptance tests like any other acceptance tests - with production-quality stubs used to simulate third parties, and gradually building up a battery of automated tests until a few end-to-end tests are used as smoke tests on production release and periodically during production. Techniques such as Blue Green Releases can be especially useful, as it gives you an opportunity to perform testing in production prior to Canary Releasing, Dark Launching etc.



  3. "Are you using continuous delivery?"
    Every commit triggers build and deployment, every deployment triggers subsequent test executions, for a number of test types. Once everything passes, code gets promoted to downstream environment, tests get executed, until code reaches environment as close to production as possible. Yes, I'd call it CD.

    "How do you incorporate non-functional testing?"
    Our test approach assumes that non-functional tests are by default triggered, executed and analysed automatically by CI server, so that CI server is able to automatically pass or fail every build. We have two type of NFT builds: those executed after deployment, and those executed periodically - which take too long to execute as part of the pipeline.

    Such approach requires to use specific tools which can decide whether the NFT build in CI server should be marked as passed or failed. This can be challenging at times, but such tools started to appear. One of them is Lightning, another is Taurus. What is equally important, people in the organisation need to start considering non-functional testing to be a part of regular, automatic regression pack. Mindsets are changing, which makes me feel optimistic.