Demystifying Persistent Storage Myths for Stateful Workloads in Kubernetes

Kubernetes is everywhere now. If you are moving or have moved your workloads to containers and still want to keep up with virtual machines, Kubernetes can do that for you. Due to this, K8S is slowly becoming a mainstream cloud operating system. However, when it comes to handling storage for applications, specifically stateful applications, you have to do some research.

Stateless and Stateful Applications

Applications deployed in containers and orchestrated by Kubernetes can be stateless or stateful. Stateless applications don’t persist state and make use of ephemeral storage; in other words, stateless applications use temporary storage provided within Kubernetes that are destroyed once the pod is deleted or accidentally shut down (it is ephemeral storage). This type of storage can be good for such stateless applications, which take minimal resources, do their tasks and vanish. Ephemeral storage is local to Kubernetes clusters and also has persistent storage. We will discuss persistent storage in the next section of this article.

However, in the case of stateful applications, storage has to follow workloads and workloads need to stay attached with storage in a persistent manner. You have to be aware of underlying storage for stateful applications. Stateful applications need to provision with the volume, which will always stay connected with pods that are hosting the stateful applications. These stateful applications mostly require storage outside the Kubernetes clusters, such as network-attached storage or remote storage.

Current Storage Problems With Kubernetes

Running stateful services is challenging in containers deployed in Kubernetes pods. Because containers are ephemeral in nature as well, data associated with containers can be lost once the container is terminated or crashes. There is no way to persist data with containers. Also, data cannot be shared between containers as it requires the containers to have the same application state. This can happen in cases where two containers working on a similar service requires communication between them.

There are so many storage options available for applications deployed in containers hosted within pods: local Kubernetes storage, which is ephemeral as well as persistent, and remote storage, which can be of many types: file, block, object storages, SQL, NoSQL databases, etc. However, block and file storage provides abstraction that enables workloads to be agnostic to the underlying implementation.

Volume Plugins for Kubernetes Storage

For abstracting away block and file storage, Kubernetes uses volume plugins, some of which are in-tree and some are out-of-tree. Local in-tree plugins used for ephemeral storage are supported within Kubernetes clusters, and these are a good fit for stateless applications. The other types of volume plugins, i.e., remote storage or out-of-tree plugins such as container storage interface (CSI), allow data to be persisted for containers.

As remote storage is out of Kubernetes clusters, the state can be maintained. Remote volume plugins can be referenced either in-line or through a persistent volume (PV) or persistent volume claim (PVC).

CSI was developed by Kubernetes to support many different types of storage beyond a Kubernetes system. With CSI, Kubernetes can handle the entire storage life cycle for pods and let developers take care of applications. As mentioned in a Kubernetes.io post, “Using CSI, third-party storage providers can write and deploy plugins exposing new storage systems in Kubernetes without ever having to touch the core Kubernetes code. This gives Kubernetes users more options for storage and makes the system more secure and reliable.”

Dynamic Provisioning and Storage Class

For any type of storage, Kubernetes has volume plugins pods can use to utilize storage in Kubernetes. It can be ephemeral, remote and local persistent storage plugins.

EmptyDir is ephemeral storage volume plugin. It works like a cache. A pod uses it to store temporary files and destroys it when there is no need. To make available remote storage data beyond the life cycle of pods, several options are available.

Persistent volumes can be provisioned manually by admins and Persistent Volume Claim to grab the required amount of storage pre-provisioned by admins. In this case, a pod in a Kubernetes cluster does not need to care about how storage is provisioned. Pods are not tightly coupled with storage. This is a good option for small and medium-size clusters, in which admins can have enough bandwidth to allocate storage for the cluster. Also, developers only focus on applications and claim storage for those.

However, it is manually involved. With the combination of dynamic volume provisioning and storage classes, it is possible to dynamically assign volumes of different storage options to containers within pods. For this, a storage class is declared with parameters for each of the volume type (remote storage, e.g., GCE, AWS, etc.) or for particular use cases that require HDD or SSDs for faster read/writes. In this way, a storage or cloud provider can offer storage for specific requirements dynamically. A persistent volume claim is created with a storage requirement, given a storage class name and is dynamically assigned to pod.  Parameters defined within storage class are non-transparent to the Kubernetes system. This adds the advantage of attaching as many as parameters for clusters to use.

At the May KubeCon event, Saad Ali, one of the founding members of Kubernetes and a member of the storage community, noted that storage for stateful applications is possible. However, it requires a problem for each of the cases has to be separated. It can be fragmented in four ways:

  1. Select – What storage should I use?
  2. Deploy – How do I deploy and manage storage?
  3. Integrate – How do I make my deployed storage available in clusters?
  4. Consume – How does my stateful app provision and use available storage?

You can get full access to slides and videos here. 

Case Study: Rook & Ceph for Stateful Application Workloads

Support for stateful application for using block and file storage is possible with Rook and Ceph. Rook is a front-end framework that sits on top of Kubernetes to orchestrate storage services in Kubernetes. Ceph is a high-performance, reliable and scalable software-defined unified storage system for file, block and object. Ceph can be configured to manage file and block storage using Kubernetes through Rook. This enables several use cases in Kubernetes that allows it to support stateful applications deployed within containers in Kubernetes clusters.

Summary

Kubernetes provides workload portability. That is, any workloads should spontaneously run on any type of infrastructure where Kubernetes clusters are deployed. In the case of handling stateful workloads, it may not easy to set up persistent storage but it is not impossible. The Kubernetes community has addressed the issue for stateful services and different storage options with CSI along with dynamic provisioning of persistent storage using storage classes. These allow integrating remote block/file storage easily into K8S clusters and can run on any different K8S-based clusters.

Sagar Nangare

Sagar Nangare

Sagar Nangare is technology blogger, focusing on data center technologies (Networking, Telecom, Cloud, Storage) and emerging domains like Edge Computing, IoT, Machine Learning, AI). He is currently serving Calsoft Inc. as Digital Strategist. He is based in Pune.

Sagar Nangare has 2 posts and counting. See all posts by Sagar Nangare