Sticking to a principle
How Access Worldpay tests infrastructure code
Principles matter. For any group, of any size, principles bring clarity, they provide a sense of purpose and they offer direction when the road ahead is difficult or complex. This is certainly true at Access Worldpay team, where test-driven development (TDD) is a guiding principle. Does it still hold true when put under stress?
At Access Worldpay, everything is tested, all the time, and the phrase “If it’s not tested, it’s broken” has become a mantra for the team. Rigorously applying this principle takes effort, even in software development where TDD has become commonplace. Maintaining the same rigour becomes far more challenging in newer areas, particularly infrastructure code. Why?
Big change, new skills
Infrastructure itself has changed rapidly and significantly, from localized hardware and software to remotely-located cloud-hosted services, such as AWS. This requires users to configure and code remote servers to their precise needs and then test that this code meets their required specifications.
Testing infrastructure code is, therefore, an entirely new discipline – one that is wholly embraced by Access Worldpay, albeit with open eyes. Dominic Byrne, a cloud engineer, explains: “Infrastructure code at the moment is in its infancy – for example, terraforming is releasing at a pretty rapid rate. It’s what we use, but it’s still in beta.”
New code, no context
As a result, Dominic explains, reliable testing frameworks do not yet exist. “People are reluctant to develop them because early ones don’t work anymore,” he says. “They were just built around a basic configuration plan so they become completely useless whenever the structure of that plan changes.”
What’s more, many cloud engineers are unused to TDD. “In the past, infrastructure engineers and application developers were two different teams,” says Daniel Beddoe, a senior software engineer at Access Worldpay. In addition, he explains, people used to question why infrastructure configuration files needed testing at all. “It turns out there’s every need to do so because now you could write something very bad in just one line of configuration that could mess up your whole infrastructure.”
Same team, single path
Instead of treating application and infrastructure engineering separately, Access Worldpay takes the same approach to every piece of work. “We’re making everything one team and applying practices from application development into infrastructure,” says Dan. This means identifying requirements and defining success before work begins on each project, whatever its nature: “We’re saying that you need to write tests for your infrastructure just as much as you need them for your applications.”
Part of this shift is organic. “People are increasingly moving from application development into operations and infrastructure,” Dominic explains. “They like the tools they currently use and they want to bring the same principles into infrastructure code.”
Live tests, late feedback
That’s easier said than done, because infrastructure code is tested against a live service, which makes the feedback loop much longer. “Without building anything, loading my spec files and running my tests on my current project averages about 44 seconds,” says Dominic: “For an application developer, tests taking that long to finish aren’t good.”
A central goal, therefore, is to ensure that this feedback cycle remains manageable – particularly since it lengthens further whenever code is altered. “If I change, say, an elastic search server, the loop goes from 44 seconds to 10 minutes at least – that’s just the reality of those services online,” Dominic explains. “So for infrastructure code, following basic TDD principles where you write your tests and watch them fail means allowing for the time it takes in your feedback loop.” That this can be difficult is hardly a surprise given the pioneering nature of the work, but the team has devised new ways to ameliorate these challenges.
Testing spec, teaming up
One approach is to change how infrastructure code is written before testing begins. Instead of writing and testing segments of code incrementally and enduring the long feedback loop each time, “you try to write each instance as the final product,” Dominic explains. “For example, saying ‘I want it to have these forwarding rules; I want this origin to match this specification,’ and you write all that and then you start your feedback loop to test that spec.”
The issue with working this way, of course, is unpicking the code should a test fail. But perhaps such workarounds need only be temporary; as infrastructure code is more widely adopted and tested, eventually more robust frameworks to support those tests may be built
In the meantime, Access Worldpay is supporting TDD for infrastructure code by physically pairing software developers and cloud engineers to shed old, silo thinking and encourage the broader outlook that cloud-based infrastructure requires. “There’s a whole world of delivery to actually get your thing to work – you need to know how to get your application out into the world,” Dominic says: “Just knowing how it works locally isn’t useful for anyone.”
Towards better, even slowly
However imperfect today’s situation, Access Worldpay remains committed to testing infrastructure code. Partly this is to uphold the principle that TDD produces better, more reliable products for its customers, and partly because testing documents systems more clearly. This makes it easier for product owners to prioritize future work and for new team-mates to make useful changes quickly and with confidence.
The end goal of being able to apply traditional application development practices to infrastructure code in a consistent, rigorous way is still some way off. In the meantime, Access Worldpay is trying to develop a methodology that makes infrastructure code tests not only rigorous but also practical – staying true to its core principle while always analyzing its approach and making further refinements wherever necessary or possible. Progress, however uncertain, is still progress.