5 Kubernetes Pain Points and How to Solve Them

Kubernetes is an open source orchestration platform for managing Linux containers in private, public and hybrid cloud environments. It is also commonly used to manage a microservices architecture. Containers and Kubernetes can be deployed on all public cloud providers.

Developers, DevOps engineers and IT operations teams use Kubernetes to automatically deploy, scale, schedule, operate and maintain containers on a cluster of machines (called nodes). Kubernetes runs containers within pods—a unit that contains one or more containers and can be easily moved or replicated between nodes.

Easing Pain Points: Kubernetes Troubleshooting With Dedicated Tools

Kubernetes is a complex system and troubleshooting anywhere in a Kubernetes cluster can be equally complex. Even on a small local Kubernetes cluster, diagnosing and resolving issues is often challenging. A problem can manifest itself in a single container, controller, control plane, one or more pods, a component or some combination of these.

The troubleshooting challenge is exacerbated in large production environments, which have low visibility and many moving parts. Teams may need to use multiple tools to collect the data they need to troubleshoot or they may need to use additional tools to diagnose and fix pain points or issues they find.

Some common troubleshooting errors are:

● ImagePullBackOff—This status means that a pod failed to retrieve a container image from the registry.
● CrashLoopBackOff—This issue indicates that a pod cannot be scheduled on the node. This could be because the node lacks sufficient resources to run the pod or the pod cannot mount the requested volume.
● CreateContainerConfigError—This error is usually caused by a missing Secret or ConfigMap.
● Kubernetes Node Not Ready—When a worker node shuts down or crashes, all stateful pods that reside on it become unavailable and the node status appears as NotReady.

The Kubernetes troubleshooting process is complex and can be frustrating, inefficient and time-consuming. For example, see this detailed guide about diagnosing and solving the CrashLoopBackOff error. Today there are dedicated tools that can provide:

● Detailed visibility—A full activity timeline showing all code and configuration changes, deployments, alerts, code differences, pod logs and more.
● Service dependencies—Visualizing the ripple effect services have on each other and related resources.
● Notifications—Integrating directly with communication channels like Slack to give Kubernetes operators and administrators the data they need to identify and resolve problems.

Reducing Networking Complexity With a Service Mesh

In the past, teams managing containerized applications had to implement traffic routing strategies using complex custom code, set up mutual TLS authentication and find ways to collect metrics from Kubernetes and its infrastructure.

These activities had to be coordinated with other systems; for example, changing firewall rules to enable traffic flows and setting up storage to accommodate logs. In general, managing communication, networking and observability was expensive, time-consuming and error-prone. A service mesh is an out-of-the-box solution that provides all of the above.

Once a Kubernetes service mesh is implemented, teams have a central control plane that provides secure, efficient communication between containers and nodes.

A service mesh can provide many capabilities for orchestrating network traffic in containerized applications. The most commonly used functions leverage rules to route and load balance traffic between application instances in a cluster, using algorithms such as “round robin”, or “least demand”.

Managing the Infrastructure and Environment Surrounding Kubernetes Clusters

Developing, deploying and operating large-scale enterprise cloud-native applications requires more than container orchestration. For example, the IT operations team should set up network firewalls, load balancers, DNS services and possibly databases, which could run within or alongside Kubernetes.

This means that the team needs to manage infrastructure tasks like maintaining physical hosts and adding, removing or replacing storage disks. This includes planning capacity and monitoring compute, storage, network utilization, allocation and performance. Kubernetes provides some capabilities that can help, like the Cluster Auto Scaler, but these are usually not enough.

To enable effective infrastructure management, the IT team should have full control over all the underlying infrastructure running Kubernetes. IT operations teams need a robust, flexible environment that allows them to scale up and down, perform predictive capacity planning and implement seamless fault management. Whether implemented in the public cloud or on-premises, this requires careful thought and the right tooling.

Addressing Policy-Driven Customization and Security Requirements

An enterprise should have policies for using approved and hardened operating system images. Most companies require management tools like databases and security configurations to use their operating systems. Many enterprise security policies will prohibit running an operating system on a public cloud, or they might result in the operating system running slowly.

The solution is to establish an image store in the company’s on-premises data center, enabling the creation of customized golden images (VM templates). IT teams can use granular RBAC policies to share images with select developers and other teams distributed worldwide based on local regulatory, performance and security requirements.

The golden images allow the teams to carry out Kubernetes deployments locally, providing the infrastructure required to run containers.

Dealing With Distributed Environments

A typical company does not create a single large Kubernetes cluster for all the teams working in different geographic locations. Building a monolithic cluster in a central location can result in latency and issues related to different data regulations in each country.

Most enterprises rely on local clusters for different locations, applications and data regulation requirements. They also use separate environments for development, testing, and production. These distributed environments can be challenging to manage without a central management platform to optimize operations, simplify deployment, and upgrade clusters. Other common security requirements include strict isolation between clusters and role-based access control (RBAC).

Administrators can better manage infrastructure across different sites by implementing a central pane of glass. It allows them to operate and manage multiple clusters, using strict project-level security controls like RBAC to manage access to each environment.

Conclusion

In this article, I covered five Kubernetes pain points and their solutions:

● Reducing networking complexity with a service mesh
● Easing Kubernetes troubleshooting with dedicated tools
● Managing the infrastructure and environment surrounding Kubernetes clusters
● Addressing policy-driven customization and security requirements
● Dealing with a distributed environment

I hope this will be useful as you overcome the operational challenges and pain points of production-scale Kubernetes.

Gilad David Mayaan

Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Samsung NEXT, NetApp and Imperva, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership.

Gilad David Mayaan has 53 posts and counting. See all posts by Gilad David Mayaan