Before you read the below rant, it’s vital for me to explain myself: I love Kubernetes; live and breathe containers. Not knowing where my apps are running, while at the same time knowing they are safe and immutable, brings relaxation and tranquility to my life. It’s like leaving your kid in their grandparents’ care and going off to work. No need to trouble me with all the hows and what-ifs of the day; if everyone is safe and alive at the end of it, the service worked, and I’m content. (My parents don’t read IT publications.)
That said, when I started running stateful applications on Kubernetes, I noticed a few inconsistencies with my stateless experience. “A few inconsistencies” is the understatement of the century, and “experience” means the striking realization that simplicity and storage are opposites when it comes to Kubernetes. It’s still worth it; containerized stateful applications have many advantages, and are agile and portable at the application level. But before you take your first steps on the path to stateful Kubernetes workloads, here are a few roadblocks to be aware of (that you would think would have been fixed by now).
Kubernetes is well-known for its ease of use. Just describe the amount of compute resources and memory required, and they are available. From that point, the pods and containers can self-heal and replicate inside their cluster. It’s a self-healing, resource-optimizing wonder that we have all learned to love. The concept of ephemerality is its biggest strength.
Storage, on the other hand, does not play by the same rule book as containers do. Ephemerality is a bad word in the world of storage, actually, and some of us choose not to have our data destroyed and created dynamically. I know – let me hold your beer for a second; it’s a shocker!
To level the playing field and run stateful workloads, we need to deal with storage and all the questions that come with it, like, “How do you retain your data? Protect it? Make it available?” And, of course, “How can we make it portable across different infrastructures, service providers and regions?”
Kubernetes requires you to declare all of the above. Turns out, you went to Kubernetes to avoid the complexity of storage, but storage was waiting for you. Maybe it felt a bit left out with all the talk of “statelessness,” and now storage wants to spend some quality time together – a lot of time; endless, tedious hours – to reconfigure your relationship and communication channels.
Also known as, “The programming language you never wanted.” Kubernetes gave us tools to deal with our persistent claims, and of course, it wants you to learn a new language, one that is only applicable to Kubernetes storage. I’m serious. It’s as if you purchased a new car, but all its controls are in Hundait, a language invented by Hyundai for the sole purpose of driving a Hyundai. Or, Ikea instructions (in any language). All you wanted was to go from point A to point B, in comfort if possible, and now you have to learn, test and maintain everything in a completely different language. If you attempted to run a simple SQL database, and found yourself learning a StorageClassName table, and trying to figure out how PersistentVolumeReclaimPolicy command works, well, you feel my pain; you deserve a hug and more money.
F#$%#[email protected] YAML Files
When you think of a deployment file such as YAML, you think of a simple, declarative set of commands. Can storage complicate things even further? Yes, it can, and it will. When storage is in the mix, you need the YAML file to point to the vendor-specific storage solution, and the configurations needed for it to communicate with Kubernetes. To really enable data portability between clusters and regions, you will probably need to write your own container storage interface (CSI) in Kubernetes own programming language and include Disaster Recovery (DR) policies (and all the other fun stuff you really don’t want to do).
Stateful Worries in a Stateless Mind
Can’t storage behave like Docker images? Can’t I just ask for storage, choose a DB and forget about it? Why can’t I just focus on my applications? I don’t want/need/care to know storage.
What we really need is a way to make the storage as available as the application, like a content delivery network (CDN) that assures consistent, synchronized data is available to any node that needs it.
And to really make adoption simple, deploy it as a completely managed service, just like the application layer managed services, but with data. Let me store a state, point at it once and be done with it, knowing that I can run my application anywhere with synced data, without all the hassles.