The Brutal Learning Curve of a New Kubernetes Cluster

December 15, 2022December 13, 2022 Chris Cooney cluster management, container orchestration, kubernetes, Kubernetes adoption

by Chris Cooney

Kubernetes has truly democratized container orchestration. It has become an unrivaled platform, allowing organizations to rapidly deploy and scale their solutions with a simple, declarative API. It is also highly extensible, meaning new operators, CLIs and tools have sprung up around the platform. These tools individually solve a variety of problems; however, together, they create an entirely new issue.

The Open Source Kubernetes Phenomenon

Few platforms can boast the breadth and scope of open source tooling that Kubernetes can. Indeed, many of the tools around which Kubernetes users have coalesced are so refined and well-supported it’s easy to forget that they’re maintained by a collective of avid volunteers whose only ambition is to improve the Kubernetes ecosystem as a whole.

Tools like Istio Service Mesh, Helm and Flux are all available on an open source basis and bring a wealth of functionality that, on another platform, would only be available with expensive plugins or extensions. However, these distributions are accompanied by a litany of other tools that have their own syntax and style. This explosion in syntax and style has gradually sharpened the learning curve for new users of Kubernetes. It has meant that a once simple and consistent platform is grouped with a much smaller, nebulous collection of commands and languages. This increase in learning material translates directly into cost, but it’s a cost that’s difficult to track.

The Hidden Cost of This Learning Curve

A new engineer in the Kubernetes world must embark on an increasingly complex journey before they deliver consistent value. First, they must become comfortable with the Kubernetes CLI and API. This is important because, ultimately, anything they do from this point onwards will be a wrapper around these fundamental layers, and an understanding of the base mechanics will enable them to troubleshoot issues.

And then comes the hard part…

If they become experts in the Kubernetes platform, much of their work still sits ahead of them. There is a collection of tools used to monitor and observe Kubernetes clusters. For example, K9s. These tools have their own patterns and syntax, requiring days, if not weeks, of engineering time.

Sponsorships Available

However, once they can query their cluster, they must understand the open source tooling that runs within the cluster. For example, the engineering team may utilize a service mesh to implement mutual TLS for intra-cluster traffic. Service meshes like Istio have a massive library of potential features. It isn’t common to understand all of these features, but if a team is using Istio heavily, this is yet another hurdle to overcome simply to become a valuable contributor to the cluster.

Why is this a more acute problem in the Kubernetes world?

Other communities are driven primarily by open source engineering and do not struggle with this. So what is it about the Kubernetes world that complicates matters? It’s typically “cross-cutting.” You don’t build logging solutions for a single application; you build logging solutions for your entire cluster. You don’t implement metrics for a namespace; you do so for the entire cluster.

This cross-cutting nature of the deployment and features means that you must come to grips with the logging system as soon as you want to log anything, likewise, with the metrics. This is fine when you coalesce around a small collection of tools, but it rapidly becomes challenging when your cluster runs 20 or 30 different tools to achieve its goals.

So what can be done?

Cluster complexity is a growing challenge in the Kubernetes space. There are several simple solutions to overcome this problem.

Don’t leverage the cluster for everything. SaaS solutions can export a great deal of complexity out of your cluster, making it simpler for engineers to engage with your infrastructure faster. SaaS solutions often come with comprehensive, intelligent documentation and customer support to answer tricky questions.
Choose your tools wisely. Select some larger, strategic tools that will solve 80% of your challenges, and be very selective about other tooling. This requires tight collaboration between all engineers and is very difficult to enforce.
Intentionally leverage other deployment mechanisms for different solutions. This is okay, but it doesn’t solve the fundamental problem of complexity. Each time you need to think of a new hosting mechanism, you lose the benefit of your Kubernetes cluster.

Is Cluster Complexity the Next big Kubernetes Problem?

There are many challenges on the horizon for the Kubernetes community. Multi-cluster environments are becoming more and more common, but federated tooling for Kubernetes still isn’t as smooth as a single-cluster deployment. Our need to scale (but our inability to replicate across multiple clusters) will naturally lead to single-cluster deployments with an increasing level of complexity and, as such, a sharper, more expensive learning curve. Only time will tell how the community will solve this challenge.