Hardening Kubernetes Multi-Cluster Environments

Kubernetes has quickly become a de facto tool within enterprise software development environments, enabling DevOps engineers to scale large numbers of containers. And recent cybersecurity hardening guidelines laid out by the NSA and CISA indicate that adoption of Kubernetes has reached critical mass. But this surge in adoption also can introduce many new vulnerabilities and misconfigurations which, if left unchecked, could put many organizations at risk.

Most infrastructure teams have moved on from running just one or two clusters. It’s now common to operate multiple clusters across various divisions and, perhaps, even across multiple clouds. Within this multi-cluster reality, it becomes difficult to keep an up-to-date inventory of all existing Kubernetes clusters, let alone their unique frailties. This can easily result in over-permissive states that break the rule of least privilege.

DevOps/Cloud-Native Live! Boston

I recently met with Jimmy Mesta, co-founder & CTO, KSOC Labs, to explore the current issues facing Kubernetes deployments. According to Mesta, increased visibility into all Kubernetes platforms and tighter role-based access control (RBAC) is necessary to keep cloud-native architecture safe and secure. Below, we’ll review these concerns and explore general methods for hardening the growing complexity of today’s Kubernetes deployments.

Concern: Lack of Visibility

So, what are the most common security holes plaguing Kubernetes? The first one Mesta identifies is the general lack of visibility when dealing with multiple Kubernetes clusters. “We’ve graduated as an industry from running one or two clusters to running many,” says Mesta. Organizations likely have a growing number of clusters spread out across company divisions. And they may be using a combination of self-managed instances and cloud-managed environments like GKE, EKS or AKS.

Managed services are compelling. It’s easy to jumpstart Kubernetes instances, resulting in a proliferation of new clusters. And enabling developers to build freely can help boost productivity. However, as an organization becomes more and more spread out, you begin to lack visibility into things like the number of active clusters, where workloads are running and how permissions are configured across clouds, says Mesta. This is bad news for high-security workloads, such as those that must meet PCI or health care standards, which require more scrutiny than others to avoid CVEs and misconfigurations.

How to Increase Visibility

To increase visibility into Kubernetes clusters and their statuses, Mesta advocates for an event-driven approach that leverages multiple cloud providers’ APIs to find and discover clusters. Using such a system, an operator subscribes to events and is notified when a new cluster is created, deleted or modified. This metadata could even generate recommendations or alerts to help spot rogue clusters, such as those popping up in unfamiliar regions. Keeping an extensive, detailed log of cluster activity is also important to have an end-to-end timeline of revisions, adds Mesta.

Instead of scanning clusters only once, an event-driven approach hooks into the K8s API itself to stream events in real-time. This strategy really becomes valuable for large conglomerates that inherit varying tech stacks due to mergers and acquisitions, according to Mesta.

Concern: Inadequate Role-Based Access Control

As of 2021, OWASP ranks broken access control as the top security risk facing web applications. Role-based access control (RBAC) misconfiguration is usually the root cause of most cyberattacks, and Kubernetes is not immune. RBAC issues often plague Kubernetes due to insecure defaults paired with the sheer complexity of the system.

Kubernetes has many different objects with many different verbs. Implementing RBAC in Kubernetes is very flexible, but it also presents challenges—it’s not uncommon for Kubernetes objects to allow traversal capabilities, enabling an attacker to grab a token and elevate their credentials. “Over-permissive actions are the default due to the complexity of the system,” says Mesta.

Locking Down RBAC

To further address the principle of least privilege in Kubernetes, Mesta recommends applying careful scrutiny of production use to spot over-permissive areas. For example, if one engineer has only made four types of actions over thirty days, they probably don’t need access to 55 different actions. By auditing Kubernetes API logs and comparing that to policies, organizations can (and should) limit privileges wherever possible.

Another answer is to grant more short-lived access to engineers. This could eliminate indefinite access to specific objects and reduce forgotten permissions. Ideally, it could be beneficial to take this further to abstract things like logging, modeling and debugging away from the cluster. Although decreasing developer access to Kubernetes is a “North star,” says Mesta, it isn’t currently possible with the state of the tooling ecosystem and the need to perform live debugging.

Open Policy Agent (OPA) is another option for hardening policies across cloud-native architecture, and its adoption continues to steadily grow. According to The State of Cloud-Native Policy Management report, 32% of organizations responding to the survey use OPA to enact policy management in Kubernetes. There’s a place for OPA in RBAC, and Mesta notes that centralizing on OPA is a rule engine of choice.

Of course, enforcing too many policies can have a downside. “If you go too tight with policies, you’ll cause a lot of headaches because you’ve stripped access away,” says Mesta. Thus, organizations will require a path to smooth this out.

Final Thoughts: Future Kubernetes Usage

Imagine a single organization navigating 50 clusters across three clouds, all running different versions of Kubernetes. In such a complex, heterogeneous multi-cluster environment, the threat matrix is vast. Not only does it become difficult to keep up with reliability standards, it becomes challenging to observe these various workloads and enforce RBAC consistently across them.

One answer could be a common layer to unify cluster observability across disparate sources. It will also be essential to enforce authorization that emphasizes identity and to follow guidelines from current cybersecurity frameworks. Just because your Kubernetes environment is cloud-managed doesn’t mean it’s immune to threats, either. Ironically, on-premises deployments often have a more accurate inventory due to the formalized nature of cluster creation, says Mesta.

Pundits say the industry is already experiencing Kubernetes sprawl, and the growth in the space shows no signs of slowing down. “The NSA hardening guidelines are an indicator of mass adoption,” says Mesta. “We are not at a late stage of Kubernetes adoption; we’re probably in the beginning of a major spike. The future will likely see organizations turning it on in ways we didn’t fathom before.”

Bill Doerrfeld

Bill Doerrfeld is a tech journalist and analyst. His beat is cloud technologies, specifically the web API economy. He began researching APIs as an Associate Editor at ProgrammableWeb, and since 2015 has been the Editor at Nordic APIs, a high-impact blog on API strategy for providers. He loves discovering new trends, interviewing key contributors, and researching new technology. He also gets out into the world to speak occasionally.

Bill Doerrfeld has 65 posts and counting. See all posts by Bill Doerrfeld