Failure is the Secret to Success in Modern Software Development

When attempting to assess the quality of a software application, the most valuable measurement, in my opinion, is that of cycle time. Improvements in cycle time have done more to improve the quality, reliability and availability of software than any other single software development technology.

There are many ways to measure cycle time in a product development life cycle. One important measurement of cycle time is the overall time it takes from when you decide to make a change to an application until that change is actually implemented, deployed and usable by your customers.

The History of Constantly Reducing Cycle Time

In the early days of computing, software cycle time was extremely long. This is because software was written onto physical punch cards with one line of software code per card. A program was a large stack of such cards. You physically carried the stack of cards over to the mainframe computer where it was placed in a queue behind 50 other stacks of cards for 50 other programs that had to be run ahead of yours.

Simply running a program took a lot of time and energy, not to mention a lot of money. Given the amount of time between getting your program ready to run and when it actually was run, you had limited attempts to get your program to work correctly. You couldn’t just give it a try and see what happens. Instead, you painstakingly went through your program line by line, by hand, to ensure it would do what you wanted it to do. Only when you were confident it would work—absolutely certain—did you go through the effort of running your program on the computer.

Over the decades, changes in computation have made significant improvements in the time it takes from writing a line of code to the time you execute that line of code. The write-compile-execute-test cycle has dramatically reduced over the years. In those early years, it could take days or longer to perform a single test run. Now, you can change a line of code, push a button and run it immediately.

But then the question became how to get your software to your customers. In the early microcomputer days, software was distributed on floppy disks and CD-ROMs—both physical media. If you wanted to change your program, you had to throw away all the unsold disks and produce a new set of disks with the latest version of the program on it. You then had to find a way to get that change to all of your existing customers.

Your customers would receive the updated software at the same time, but not all of them would update right away. It might be months or even years between when a software bug was fixed in an application and when a customer began using a software version that contained that bug fix.

In this environment, it was critically important that software was thoroughly tested before it was released to customers since the time and energy it took to fix a defect was typically very long and extremely expensive.

The advent of modern, software-as-a-service web applications changed this dynamic drastically. Today, you can make a code change and deploy it to every one of your customers simultaneously and almost instantly. With modern CI/CD pipelines, a change you made can be deployed and made available to every customer with virtually no human involvement at all.

This means the cost to make a change, repair a defect or improve a customer experience is negligible—at least beyond the cost of making the change itself.

Modern, Cloud-Based Release Processes

This change has created a profound shift in how software is developed and released. Driven by the extremely high costs to repair a defect, long QA cycles were commonplace in early software development processes. Companies would often invest more in software QA than they would in software development to reduce the number of changes required. These QA cycles were critical because the cost to repair a defect was extremely high.

But with the cost of making a change virtually eliminated, there is no longer a need for long QA cycles. Long QA cycles are now a waste of resources and an unnecessary delay in getting a change out to customers.

In fact, many modern organizations don’t do QA testing on their software at all. Developers do their routine testing during the development cycle, but once they are satisfied that a change works, they push a button. The automated continuous deployment process releases it to production almost immediately. If there is a bug in the software—a bug that was not noticed until it was released to production—a fix can be quickly and easily deployed to resolve the bug with little cost or effort.

The mean-time-to-repair (MTTR) for a defect in a modern, cloud-based software application can be reduced dramatically—from months to minutes. At the same time, due to this significantly reduced cycle time, new features and capabilities can be rolled out to customers at much higher speeds.

Modern Application Failures for Learning

Failure—mistakes, bugs, problems—that were once feared in software development can now be mostly ignored. The advantages gained from the learnings from a failure far outweigh the cost of the failure itself.

Think about that for a moment. This means that failure can actually be encouraged. Making a change that is a mistake is a great way to learn what does and does not work for a given application at very little cost. You can try experiments that may or may not be successful.

In other words, in modern applications, failure is now a viable option and even a welcome addition to the software development process. Continuous improvement, incremental changes and incremental fixes are great ways to improve the quality, availability and usability of a software application. Companies that embrace a “failure is acceptable” mindset will move faster, improve their products more quickly and be more competitive than companies that still try to avoid failure.

Failure is not only an option but perhaps the only option to maintain a competitive product offering in the modern digital world.

Lee Atchison

Lee Atchison is an author and recognized thought leader in cloud computing and application modernization with more than three decades of experience, working at modern application organizations such as Amazon, AWS, and New Relic. Lee is widely quoted in many publications and has been a featured speaker across the globe. Lee’s most recent book is Architecting for Scale (O’Reilly Media). https://leeatchison.com

Lee Atchison has 59 posts and counting. See all posts by Lee Atchison