Despite Google’s ‘Autopilot,’ Kubernetes is Still Hard

We are, obviously, a very long way from being able to put Kubernetes cluster management on “autopilot,” but Google’s new platform ostensibly moves us closer to achieving that goal. 

Google Kubernetes Engine (GKE) Autopilot’s “mode of operation” can, purportedly, reduce Kubernetes operational costs and management time. It does this by extending the capabilities of GKE cluster configuration and management capabilities. More of the underlying infrastructure is automated, thus removing much of the “manual assembly and tinkering to optimize [Kubernetes] clusters for your needs,” Drew Bradstock, group product manager, GKE for Google wrote in a blog post.

In addition to adding security and compliance features that GKE lacks, Autopilot automates much of the management and provisioning of the control plane and nodes.

However, in many ways, Google is arguably only scratching the surface instead of offering a fully automated platform that might be used to manage service mesh, cluster deployments and other associated tasks so developers could upload their code on Kubernetes while the platform does the rest. Still, Google says Autopilot does offload many of the cumbersome tasks associated with Kubernetes cluster provisioning and management. 

“Autopilot is a solid step into the right direction, and as a ‘cherry on top,’ it addresses the often painful challenge of rightsizing Kubernetes environments,” Torsten Volk, an analyst for Enterprise Management Associates (EMA), told Container Journal.

 

 

According to Bradstock, Autopilot, thanks to “its optimized, ready-for-production cluster,”  offers much-needed, added security layers to Kubernetes while reducing operational complexities by “reducing the need to learn the nitty-gritty details of cluster configuration.”

Bradstock adds that Autopilot’s cluster-infrastructure management capabilities can help reduce “Day 2″ operational and maintenance costs, for example. 

“Autopilot is a hands-off, fully managed Kubernetes experience that allows you to focus more on your workloads and less on managing cluster infrastructure,” Bradstock writes.

More specifically, Autopilot can automate the load management process and apply policies and best practices for Kubernetes clusters. Shielded GKE Nodes and Workload Identity are among the security capabilities automatically applied to the clusters. These policies and supports, Google says, are based on Google’s in-house policies, used by its engineers and SREs for its internal operations. 

According to Volk, these capabilities can indeed alleviate many of the more cumbersome and time-consuming tasks operations team members usually perform. This is because, Volk says, “transitioning to a fully declarative and, therefore, policy-driven, approach toward Kubernetes operations management still is the critical pain point and source of overhead cost when it comes to the adoption of Kubernetes at scale.” 

“This challenge is not limited to the Kubernetes container scheduler and its runtime, but extends to the entire application stack,” including monitoring, logging, tracing, data streaming and messaging, service discovery, service mesh, cloud native storage management and container registry, Volk says. 

However, even once Autopilot is adopted, DevOps teams will continue to face often enormous operational challenges when making the shift to Kubernetes environments. Once the shift is completed, maintaining these highly distributed and containerized infrastructures can be equally challenging, at least. This is because, Volk says, most problems arise “at the seams between Kubernetes and its neighboring platforms, such as networking, storage, monitoring and DevOps pipeline automation. This is where managed Kubernetes clouds, such as GKE, typically work with customers to get issues straightened out, and it is also the greatest source of today’s customer anxiety around Kubernetes operations,” Volk says.

According to Google, additional Autopilot capabilities for Kubernetes cluster management include: 

  • The creation of Google cluster-provisioning processes based on in-house knowledge and expertise from Google SREs and engineers.
  • The automation of Google hardening guidelines and security best practices , including the use of Shielded GKE Nodes and Google Workload Identity capabilities. 
  • The use of Google SRE to manage nodes, including node provisioning, maintenance and lifecycle management.
  • Automated provisioning and scaling of Google Cloud infrastructure capacity and resources depending on workloads and computing requirements. 

 

B. Cameron Gain

B. Cameron Gain first began writing about technology when he hacked the Commodore 64 family computer in the early 1980s and documented his exploit. Since his misspent youth, he has put his obsession with software development to better use by writing thousands of papers, manuals, and articles for both online and print. His byline has appeared in Wired, PCWorld, Technology Review, Popular Science, EEtimes, and numerous other media outlets.

B. Cameron Gain has 18 posts and counting. See all posts by B. Cameron Gain