Top 5 Frustrations With Debugging Kubernetes

Kubernetes has officially crossed the chasm! The hype has been real for years, with the CNCF noting in their recent report that 96% of organizations are using or evaluating the technology, which has been fully embraced by large enterprises. In fact, the subhead of the survey reads: “The year Kubernetes crossed the chasm.”

The overwhelming success of Kubernetes can be attributed to many factors. Many other posts have covered these reasons in-depth and, if you are reading this, there is a good chance you’ve evaluated the technology yourself. But one of the main reasons for adoption is not simplicity—especially not in Day 2 operations. Sure, Kubernetes has made it easier to deploy large, scalable applications—but managing and troubleshooting Kubernetes can be quite difficult.

Here are the top five frustrations we’ve seen when debugging Kubernetes:

1.  Recreating the environment on your local machine 

A common way software developers debug applications is by recreating the scenario on their local machine so they can poke around without impacting production (aka customers). But when working with Kubernetes and microservices, you often have complex environments containing a large number of images, servers and configurations. In many cases, it is almost impossible to recreate the environment locally, so even if you think you are debugging locally, it may not accurately map back to production reality.

2. Working with Kubernetes is resource-heavy

Running Kubernetes on your computer is very resource-heavy as it requires a few ‘must-haves’ to do so. For instance, for every specific service that you want to debug, you need to locally spin up the entire environment that supports it, including Docker Desktop or any other management layer. Additionally, you must use Docker Compose to run your code locally and get the entire Docker image. This is a very cumbersome process. Thus, even if you have a top-tier MacBook Pro, running Kubernetes can really impact the machine’s performance which can be incredibly frustrating for developers.

3. Debugging using the new kubectl feature isn’t comfortable, to say the least

In 2021, Kubernetes introduced kubectl debug. The feature works in tandem with ephemeral containers (which moved to beta in November), which are temporary containers that you spin up only to inspect running pods. It’s supposed to make troubleshooting easier and bug reproduction possible. Before this, you couldn’t spin up new containers in running pods at all. 

Besides being cumbersome, it only allows you to see OS info such as environment variables. Software engineers often need to go deeper into the application level, see the data being processed there and understand how the code flows and what the underlying state is. If that is what you are after, then you will have to add a new log line, rebuild the container and update the underlying deployment to the new version and (pro tip!) also push a new version of the container. If you don’t do that, everything will break.

4. Debugging in Kubernetes is log-dependent

When you are debugging in Kubernetes, you are only able to look at the logs that are running at that very moment. That means you have to depend on what people did in the past and trust that logs were organized or parsed correctly. Otherwise, you’ll have to redeploy the entire application to properly apply logs, which takes a lot of time and requires elevated permissions in production.

5. It’s difficult to debug at scale

Abstraction is a gift and a curse; when things go wrong, especially at scale, it can be difficult to get under the hood of Kubernetes to understand the problem. Sometimes, even within a single cluster, an issue can be happening on one node and not the other; pinning down the issue and the root cause is very complicated.  Reproducing a bug is an art—reproducing millions of requests isn’t easy. Often, several different tools are needed to reproduce the scenario, and it may be impossible to figure out which container, pod or resource would have been the first to break with so much activity.

What’s the solution?

 The truth is that there is no one-size-fits-all solution. However, here are some quick tips that can benefit everyone interested in using Kubernetes (and debugging it):

  • Leverage open source—There’s a new open source website called ValidKube that every Kubernetes developer should check out. It pulls together other popular open source projects within the Kubernetes ecosystem like kubeval, kubectl-neat, Kubescout and trivy that can help you clean, validate and secure your Kubernetes YAML proactively, saving you plenty of debugging headaches down the line.
  • Adopt dynamic observability—We have seen the rise of observability alongside the rise of Kubernetes. If you’ve read this far, I think it is safe to assume you have at least some basic monitoring and observability in place. Given the highly dynamic and ephemeral nature of Kubernetes, it may be worth considering dynamic observability tooling that gives you the ability to extract code-level data on the fly without having to restart and redeploy the application. It will give you the (super)power to add a log line retroactively to an area you want to explore while troubleshooting, even if you never took the time to write the logline in your code beforehand.
  • Get control of your logging verbosity—In an ideal world, software developers wouldn’t need to spend their time writing logs at all. They could focus on building cool features, adding business value and addressing customer requests. But logging is still a necessary evil and all developers should look into technology—such as live logging—that makes it possible to adjust the verbosity of these logs on the fly, so devs aren’t drowning in noise that they don’t care about.

At the end of the day, Kubernetes is becoming the heart of the cloud-native ecosystem. Adopting the best tools and practices to debug your Kubernetes applications will make your life so much easier as a developer. So check out open source projects, look into dynamic observability tooling and get your log verbosity under control!


Join us for KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain (and virtual) from May 16-20—the first in-person European event in three years!

Liran Haimovitch

Liran is Co-Founder and CTO of Rookout. He’s an award-winning cyber security practitioner and writer. As an advocate of modern software methodologies like agile, lean and DevOps, Liran’s passion is to understand how software actually works.

Liran Haimovitch has 2 posts and counting. See all posts by Liran Haimovitch