The Next Kubernetes Frontier: Multicluster Management

As large enterprises gradually adopt Kubernetes, replacing prior large-scale infrastructures, they are deploying dozens or even hundreds of clusters. Kubernetes’ own built-in management stops at the single-cluster level, which leaves infrastructure owners to figure out a solution for coherent management of their far-flung components. Applications need to be able to run on multiple clusters in multiple locations; admins need to monitor resource levels and policies need to be applied across the whole network.

Ever innovative, the cloud-native ecosystem has been working on developing a variety of solutions to this problem. Many of these—like ClusterAPI and Submariner—are components that handle specific multicluster management problems. Other software, discussed below, attempts to handle all of the needs of an organization with many Kubernetes clusters to control.

A good example of how the thinking around and usage and implementations of multicluster management has advanced during the last five years is the Chinese technology firm Ant Group. Ant manages dozens of Kubernetes clusters spread around the globe with thousands of nodes (servers) in each cluster. They organize bundles of applications and their required components (like databases) into distributable stacks called logic data centers (LDCs) and deploy them around their distributed infrastructure. This atomization of work helps them fulfill two key goals of their operation: high availability and transactionality. First, an LDC with a particular application function must always be available somewhere. Second, units of work are executed so as to make them verifiable and able to be rolled back in the event of any failures.

Yong Feng, senior staff engineer from PaaS Ant Group, said, “Ant Group has an infrastructure with dozens of Kubernetes clusters, hundreds of thousands of nodes and thousands of critical applications. In such a cloud-native infrastructure, there will be tens of thousands of pods created and deleted every day. It is challenging to build a highly available, scalable and secure backplane to manage those clusters and applications.”

Starting With KubeFed

In the Kubernetes project, multicluster features are handled by the appropriately named SIG-Multicluster team. This team developed a cluster federation technology, nicknamed KubeFed, in 2017. Federation was initially conceived as a built-in feature of Kubernetes, but soon ran afoul of both the difficulty of implementation and the problem that “multicluster” doesn’t mean the same thing to all users.

Federation v1 could distribute services to multiple Kubernetes clusters but could not handle other kinds of objects, nor could it really “manage” the clusters in any other way. A few users with fairly specialized needs—particularly, a couple of academic labs—still use it, but the project has been archived by Kubernetes and never became a core feature.

Federation v1 was then rapidly replaced with a refactored design called “KubeFed v2” which is used by operations staff around the world. It allows a single Kubernetes cluster to deploy multiple kinds of objects to multiple other Kubernetes clusters. KubeFed v2 also allows the “control plane” main cluster to manage the other clusters, including a lot of their resources and policy. This was Ant Group’s natural first platform for their many Kubernetes clusters.

One of their chief priorities is scalability; both efficiency and the ability to scale out their system by adding nodes and whole clusters when they need it. That’s every year because for China’s annual Singles Day on November 11, the Ant team generally needs to rapidly deploy a lot of additional capacity to support peak online shopping workloads. However, KubeFed was both slow to add new clusters and inefficient at managing a large number of clusters, as they found out.

In a KubeFed v2 cluster, one Kubernetes cluster acts as a single “control plane” for all of the other clusters. Ant found that the resource usage of this control plane cluster was extremely high for each managed cluster and each distributed application. In a test with just 3% of their normal application workload, they found that the requirements of the control plane would saturate the capacity of moderate-sized cloud instances, as well as suffering from poor response times. As such, they never ran their full workload on KubeFed.

A second limitation concerns Kubernetes’ extension functionality, known as custom resource definitions, or CRDs. Advanced users, like Ant, can make extensive use of CRDs to “reprogram” Kubernetes to suit their applications, whether to build operators or for other implementations. For distributed CRDs, KubeFed requires each one to have a second “federated CRD”. Not only does this double the number of objects in the cluster it introduces serious problems in terms of keeping CRD versions and API versions in sync across all clusters, and becomes a serious obstacle to upgrading applications because of the inability to support version skew.

This proliferation of CRDs also led to serious troubleshooting problems. Where there is a custom resource for the local cluster, there is also one on the federated control plane that represents a graphic aggregate view of that local cluster resource. But if there is something wrong in the local cluster, it’s difficult to know what that problem is starting from the federation control plane. The operator logs and resource events on the local cluster are invisible from the federation level.

Moving to Open Cluster Management

The Open Cluster Management project (OCM) was started by IBM and was open sourced by Red Hat last year. OCM has been able to evolve the approach to multicluster operations by building on Ant’s experience, among others. As such, it offloads computation from the management cluster to agents on each member cluster, letting it distribute its work across the entire infrastructure. This makes it possible to have at least an order of magnitude more clusters in one multicluster group than was possible with KubeFed. So far, users have tested groups of up to 1,000 clusters.

OCM was also able to take advantage of the evolution of Kubernetes itself to become more efficient. For example, those extensions packaged as CRDs make use of the WorkAPI, a proposed SIG-Multicluster subproject, to distribute Kubernetes objects between clusters. The WorkAPI embeds a subset of native Kubernetes resources in it as a definition of the objects to be deployed and leaves it to the agents to do the deployment. This model is both more flexible and minimizes the need for any central control plane for deployments. The WorkAPI can define multiple versions of a resource together, supporting upgrade paths for applications.

Most importantly, OCM enables a lot more automation in cluster deployment. In the KubeFed generation, adding a cluster to the cluster group was a very manual process, involving many steps to get the clusters communicating. The new platform was able to simplify this. For example, because it runs on a “pull” basis, it was no longer necessary to have a multi-stage manual certificate registration so that the primary cluster could send commands.

This means that adding new clusters requires very few actions; staff can simply deploy the “Klusterlet” agents on the target Kubernetes cluster and it joins the group. Not only is this easier for admins, it means that the kind of rapid deployment of many new clusters that Ant needs to prepare for Singles Day is now possible.

Where Next With Kubernetes Multicluster?

The Kubernetes community has rapidly evolved the software’s globe-spanning multicluster capabilities in just four years, from Federation v1 to KubeFed v2 to Open Cluster Management. By taking advantage of the talented work of many engineers, both inside SIG-Multicluster and outside it in projects like OCM and Submariner, both the scale and the capabilities of multicluster groups are vastly greater than they were.

Will there be a new platform to evolve multicluster capabilities further, or is OCM the ultimate implementation? Yong Feng thinks so.

“Looking forward, with the joint efforts from Red Hat, Ant Group, AliCloud and other participants, the Open Cluster Management project could become the standard and backplane for building Kubernetes based multi-cluster solutions,” he said. Regardless, one thing is clear: You can now run a whole planet on Kubernetes.


Red Hat’s Qiu Jian co-authored this article.

To hear more about cloud-native topics, join the Cloud Native Computing Foundation and cloud-native community at KubeCon+CloudNativeCon North America 2021 – October 11-15, 2021

Min Kim

Min Kim is a software engineer with Ant Group and an active Kubernetes maintainer and sub-project owner, mostly working in the field of SIG API-Machinery. He is good at feeding his orange cat and cleaning the litter box.

Min Kim has 1 posts and counting. See all posts by Min Kim