Would you jump out of the airplane and then check your parachute rigging?
Would you start your scuba dive before you checked that your air was turned on?
Would you do your preflight check during the takeoff roll?
I don’t know anyone who would answer yes to any of these questions, yet most of us still engage in the software development equivalent of these risky practices: we check our code in and then do a production build on it. We’ve even given this practice a name: Continuous Integration. It should be called Continuous Build Breakage.
One of the primary goals of continuous integration is fast feedback for developers; they need the technology infrastructure that allows them to build and test as early and often as possible. This has driven the market for continuous integration for years with the goal being, obviously, to build high quality software for customers, whether it’s a small team implementing an internal application for financial reporting or a multi-million dollar gaming company about to roll out its latest cross-platform blockbuster hit.
To build the best software, development teams must hit their important milestones to allow for sufficient QA, and they struggle when their code breaks. Why? Because many still do their validation and tests on their own machines, not on production-class systems. The development team is following the continuous integration and agile methodology of test early and often, yet the build breaks when their code is integrated. They respond, “It worked on my machine!” So something was different in the production environment than on the developer’s machine, and it failed. Perhaps they had some different tools, or versions of tools – multiply that by a team of 10s or 100s of developers and it quickly gets unruly.
The unintended consequence of continuous integration in this common scenario is threefold. First, developers are fearful to check in late in the day-they don’t want to be the one who breaks down the build and has to stay all night to get things straightened out, so productivity is the first casualty. Second, delayed code submission causes the schedule to slip. Or third, good deltas get slapped with a bad build label because they went in with a changes that were busted. Not a motivating result. Morale goes into a nosedive.
And we’re not taking into consideration the added challenges of a global development team. With a U.S. and overseas development team working through time zone and language challenges, the impact of lost productivity can go from hours to days. As we all know, lost productivity is lost revenue.
And the loss in morale cannot be underestimated. I’ve been witness to customers losing great developers just because of this issue. Very good companies have been unable to hold onto their best developers because builds would continuously break due to the infrastructure issues, not bad code.
So what’s the solution?
The development team shouldn’t be validating their builds on anything but a production-class environment. Early, frequent testing on the right OS, database versions, and tool chains combined with the ability to test cross platform is key to an effective agile strategy. Pre-flight build and test – building and testing before check in, including both unit and a subset of system tests in a production class environment eliminates the big productivity losses and drops in morale suffered in the examples given above. The developer gets fast feedback. Is the code clean? If the answer is yes, it is automatically checked in and ready for QA. If it fails, it’s kicked back to the developer, but the problems are caught much earlier in the testing process and the rest of the team continues to be productive, checking in code with no waiting. So a mistake that might not have been detected until takeoff is now found before anyone boards the plane. What would have been hours of delay, now can be isolated and fixed without impacting all the passengers as well other planes waiting to take-off.
So how well does it really work? We introduced pre-flight builds as a feature in ElectricCommander 3.0. and have several customers that put pre-flight builds into practice every day. We have one customer who has seen a 90 percent reduction in broken builds in just a couple of months. And it’s worth noting that the remaining 10 percent of broken builds were due to developers who were not using the pre-flight build and continuous integration methodology.
The result of testing early and often is that you discover problems in the pre-flight stage versus “mid air” and you get faster feedback, and end up with a cleaner code base. One plane goes back to the terminal but 20 others take off in the meantime. In agile development, you get working software and can release any time because you are hitting your milestones. Your demos are ready to roll; QA is testing the right version and can run a clean test. And as we all know, it costs a few dollars to catch bugs early, rather than hundreds or thousands when they reach the customer.
Continuous integration is great, but it only tells you there’s a problem after it’s too late to prevent the downstream damage of that problem – high-performing development teams have incorporated pre-flight build and test to gain agility and focus on the code, not the infrastructure.
Latest posts by Anders Wallgren (see all)
- Continuous Discussions (#c9d9) Podcast, Episode 85: Pipeline Analytics and Insights - March 6, 2018
- NetEnt: Betting on DevOps - February 13, 2018
- Continuous Discussions (#c9d9) Podcast, Episode 84: Software Architecture & Pipeline Architecture for DevOps - February 6, 2018