Ensuring Resilience for Stateful Kubernetes

Kubernetes is the de-facto standard for new application deployments in the public cloud. However, as companies migrate more workloads into K8s, they often encounter issues with application uptime and resiliency.

It might be relatively easy to recover applications with the same configuration on a different cluster in a different region or cloud provider in a business continuity scenario, but applications don’t exist in a vacuum—they require data to function. Recovering an application state is much more complex.

Organizations attempting to build high-availability environments for stateful K8s apps to fulfill their service level agreements (SLA) and maintain application and data availability face some unique challenges.

Challenges With Kubernetes Stateful Apps

Complexity: One of the major issues with using Kubernetes is that it’s difficult to set up storage for stateful apps while maintaining resiliency and application mobility. The standard solutions in the public cloud leave much to be desired, and anything beyond standard requires significant expertise to set up and maintain. Therefore, the road to stateful, resilient operations is a long one. It requires knowledge of storage, networking and replication. Many teams lack the capital, manpower or expertise to do it themselves.

The difficulty is that the skill set necessary to build storage infrastructures differs significantly from what most DevOps professionals are trained to (or have time to) handle. Most cloud-native teams lack the expertise of storage experts who are trained to configure and maintain the specialized storage networks and equipment necessary to make sure it is all available, resilient and backed up—if access to these advanced storage solutions in the public cloud is available in the first place.

Vendor lock-in: Since storage and infrastructure is from a specific vendor (EBS, Azure Disk, etc.), vendor lock-in and data gravity are inevitable, and the more data gravity (i.e. the amount of data at a certain location) there is, the more difficult it is to move anywhere else in the future. Applications are continuously pulled to where data is located, and past data storage choices determine the future of its location.

When data is brought to the public cloud, it’s inevitable that the service provider influences how well the application performs.

Resiliency challenges: When it comes to resiliency, relying solely on a single cloud provider or a single location within a cloud provider has significant limitations. However, most organizations don’t have any other choice but to rely on a single cloud provider or region due to the prohibitive complexity of setting up cross-region or multi-cloud infrastructures for stateful apps.

Even if data is replicated across different availability zones, it’s still at risk from a regional failure. Therefore, to provide business continuity for stateful apps running in the cloud, you need to be able to recover at a secondary site or region immediately without losing any data along the way.

Risk: Certain levels of risk are inevitable. But when your resiliency plan amounts to running your business on an AWS or Google Cloud location that statistically fails the least, you’ve got a serious problem.

Bloated infrastructures: In addition, since data is useless without its application, to allow stateful K8s apps to recover across different infrastructures and public cloud providers, entire application environments including the application state must be replicated and be completely agnostic to the underlying infrastructure on which the app is running.

Over time, these infrastructures grow increasingly difficult to control. The need for additional workarounds becomes unbearable as teams try desperately to maintain uptime.

Solving the Public Cloud Resiliency Puzzle

As complexity grows, so does the need for more sophisticated resilience, performance and operations techniques, but this sophistication soon snowballs into an uncontrollable mess. There must be a way to make the complex simple.

To address these issues, a new category has emerged: Stateful application mobility platforms. These platforms allow users to provision stateful applications without worrying about how they’re configured or deployed, allowing stateful apps to continue running without interruption and be able to be recovered in another location with zero data loss. Users can rest assured that their cluster can move between cloud providers, regions and data centers.

This results in greater flexibility, better performance and improved resiliency without the unnecessary complexity of today’s manual techniques. Ultimately, simplifying where stateful apps are run by allowing them to move freely between locations allows organizations to take advantage of all the benefits the cloud has to offer while avoiding its limitations.

By using these platforms, data will always be available regardless of where apps are deployed.

Resiliency for stateful Kubernetes applications is no longer a pipe dream, but a reality thanks to a simple yet scalable storage solution that can be deployed across multiple clouds and locations in a single click.

Michael Greenberg

Michael Greenberg is an entrepreneur and IT infrastructure expert with more than 20 years in storage and UNIX systems design. Currently, Michael heads the Product team at Statehub. Prior to Statehub, he served as chief architect at EverCompliant, and co-founded Leanscape as well as Leandigo. Michael is also a veteran of the IDF’s elite cyber intelligence corps, Unit 8200.

Michael Greenberg has 1 posts and counting. See all posts by Michael Greenberg