6 Common Pitfalls to Avoid When Deploying Kubernetes

February 8, 2023February 6, 2023 Bill Doerrfeld anti-patterns, container orchestration, kubernetes, policy management

Since its debut in 2015, Kubernetes has steadily grown into a de facto container orchestrator. Gartner estimates that by 2026, 90% of global organizations will be running containerized applications in production. To enable this, many will turn to Kubernetes, as it offers a key abstraction layer across storage, computing and networking. Yet, the learning curve around Kubernetes still poses a challenge for many organizations attempting to leverage the benefits of containers and cloud-native technology.

I recently chatted with Spectro Cloud co-founder and CTO, Saad Malik, about common anti-patterns that most organizations run into when using Kubernetes. According to Malik, Kubernetes could help developers avoid worrying about infrastructure, enabling them to move faster and reduce time-to-market. Yet, at the same time, developers are already overwhelmed with building code, operating their IDE and managing many libraries and dependencies. Learning the ins and outs of a new technology can easily lead to some hasty adoptions.

Kubernetes alters how you construct and run applications, requiring new knowledge of concepts around ingress, ReplicaSets, health checks and more. As such, Malik notices a handful of pitfalls that often occur when working with Kubernetes and scaling its use. Below, we’ll review some of these areas and offer solutions to help manage new complexities and retain agility.

1. Not Sizing Clusters Properly

Many organizations were eager to adopt container orchestration and rushed to introduce Kubernetes. However, with increasing adoption, each development team often spun up its own clusters for its own applications. Yet, having each developer team manage the life cycles and responsibilities for each cluster becomes a significant effort, says Malik. And without proper safeguards to prevent isolation between different teams, having large multi-tenant clusters can become challenging.

Solution: Design with isolation in mind. Put more control in the hands of platform engineers to develop smaller, more single-tenant clusters. But, if you go down this road, prepare for new cluster management side effects.

Sponsorships Available

2. Decentralized Cloud-Native Monitoring

Engineering teams require visibility into the health of their applications and infrastructure to identify problems early on. And monitoring is achievable when dealing with a single cluster. However, when working with a large number of clusters, observability becomes more complex and harder to manage. Cloud-native observability tools also output a sea of data, making it challenging to zone in on which alerts really matter.

Solution: You can’t expect teams to log in and out of each and every cluster to gather important logs and metrics. Therefore, Malik recommends utilizing a tool like Thanos to bring all relevant metrics into a centralized place. Or Amazon-Managed Prometheus (AMP) is another means to monitor metrics and health across thousands of apps and clusters.

3. Too Many Node Pool Configurations

Having too many different types of node pool configurations could also put a strain on your Kubernetes platform. Computing workloads often perform best on slightly different infrastructures and hardware. Perhaps a workload runs best on an ARM-based computer processor or a specific NVIDIA GPU chip. But although configuring very specific workload environments can increase performance optimizations, you may run into capacity issues when scheduling containers.

Solution: Malik recommends generalizing a bit when scheduling where the actual workloads go. Avoid a direct one-to-one relationship and ensure node pools support multiple types.

4. Lack of K8s Tooling Governance

Another common pitfall is a lack of governance around what tooling and integrations developers bring into Kubernetes. These tools might support common areas like logging, authentication, CI/CD, database management and more. However, as more layers are brought in, it can cause significant overhead for platform engineering teams, who become responsible for maintaining the availability and security of all these integrations.

Solution: The solution here, Malik recommends, is to cut back on giving developers total control and to treat the platform as more of a product. Invoke more of a request-based model wherein the platform engineering team accepts input from developers but ultimately has the power to decide which integration to support as the new “as-a-service.”

5. No Granular Access Levels

Too often, developer teams have access to the complete cluster admin. If someone has access to the entire Kubernetes cluster, they could install integrations that compromise the cluster for everyone else working in it. When service accounts needlessly dole out too much access, they break the rule of least privilege, which goes against a zero-trust security policy. This dilemma is not new — most already agree that cloud-native security default settings are too open, according to the CNCF.

Solution: Identify rules and assign responsibilities on a need-to-access basis. To this end, tools like Open Policy Agent (OPA) or Kyverno can help you implement fine-grained cloud-native policy management.

6. Those Assigned to K8s Lack the Skills to Operate It

Finally, another common anti-pattern associated with Kubernetes is assigning too much responsibility to those who lack the skills to support the technology. Many teams are still more accustomed to VMs or lack expertise in container orchestration. Solving this issue will either equate to a reassigning of roles or inserting more abstraction layers.

Solution: Balance the skills of Kubernetes users with the ones of its consumers.

Final Thoughts

Avoiding these pitfalls will require rethinking and perhaps new tooling to cover the gaps in maintaining a new platform. These tools should offer customizations, says Malik, because although many Kubernetes-native tooling abstractions are being developed, some want lower-level capabilities to optimize their behaviors.