Building Stateful Apps on K8s Takes a Village

March 28, 2022July 31, 2022 Alex Chircop cloud-native storage, cncf, kubernetes, stateful apps

Running stateful, often business-critical applications in Kubernetes is increasingly common. It’s simple, right? Kubernetes just wants fast, simple access to persistent volumes and databases. With the right storage management, optimization, automation and developer self-service, this is possible.

Here is the hard part—someone needs to know what the “right” solution is and why it’s right.

In my role as chair of the CNCF Storage Technical Advisory Group (TAG), I am privileged to have an incredible view of what is happening around state and storage within the context of Kubernetes and cloud-native computing. We learned from the early stage architectural decisions about how to implement stateful applications in Kubernetes that, while pretty technical, getting those wrong can cost organizations dearly in very clear business terms. This manifests in a few ways:

Lock-in, whether that is to cloud providers, hardware solutions or managed data and storage services, translates to inefficient, uncompetitive operating costs. Poor data storage choices can inflate cloud costs by several orders of magnitude; let’s not forget that the amount of data businesses have to handle isn’t decreasing.

Poor, laggy application performance and the inability to scale rapidly lead to unsatisfactory user experience.

System downtime, service outages and poor security could result in loss of customer data and cause users to lose confidence.

There is a big difference between doing something and doing something well.

Increasingly, we see the product owners, system architects, platform engineers, cloud developers and those overseeing digitization caring more about how state is handled in Kubernetes. At the same time, we observe that, in many cases, this is happening without the input of traditional enterprise storage engineers and experts.

Sponsorships Available

I believe that, in a DevOps world of intermeshed, overlapping roles and responsibilities, it is vital that architects, platform engineers and key decision-makers understand the implications of storage decisions in more detail so we don’t repeat mistakes that have been made in the past.

In my view, to date, our most significant milestone in the CNCF Storage TAG was creating the Storage Landscape Whitepaper. This is intended to provide an overview of what can often seem like an impenetrable world of overlapping terms, metrics and considerations. Its objective is to provide greater clarity for the increasing number of non-storage-experts who, in the cloud/Kubernetes-native world of DevOps automation and infrastructure-as-code (IaC), are getting more and more involved and making important decisions—but typically with a limited understanding of the holistic environment.

The CNCF paper covers basic storage attributes such as availability, scale, performance, data protection and failovers, and provides an overview of the basic technology landscape, storage interfaces and the options for delivering state into Kubernetes. It may seem overwhelming to fully digest the full 44 pages, but it’s highly useful to drill down in specific areas to bridge gaps in understanding as well as looking at the use case/comparison tables.

There is the argument that effective DevOps automation should mean that developers don’t have to concern themselves with storage or any other aspect of infrastructure and operations. That’s technically true, but it is still not the reality for everyone. Until it is, we need to bridge that gap by building a holistic understanding of the landscape for those who are building stateful apps today.