Extending Kubernetes With Service Mesh

May 8, 2019May 7, 2019 Zach Jory east-west traffic, kubernetes, microservices, service mesh, TLS

Andrew Jenkins, CTO at Aspen Mesh, recently sat down to discuss how service mesh can extend Kubernetes to manage microservice architectures even better. For more on service mesh, consider attending KubeCon + CloudNativeCon EU, May 20-23 in Barcelona, Spain.

Kubernetes has taken the enterprise by storm. Companies continue to evaluate and adopt Kubernetes to solve containerized application challenges. In your mind, what has been the catalyst behind this growth?

It’s a combination of a well-crafted API and a good controller architecture. Systems like Kubernetes face a trade-off between extensibility and opinionated-ness. Enterprises want their platforms to be extensible, but not so far to the extreme that they’re really just an extension API with no “core” functionality that you can rely on. Enterprises want their platforms to have some opinion, especially if it matches their own, but not so much that the enterprise can’t sprinkle in some of their own choices, maybe to incorporate legacy systems or maybe to address compliance.

So it feels like Kubernetes has opinion in all the right places. For example, pods are core, and you can’t really say you have Kubernetes if you don’t adhere to the pod concept. But if you want to layer on top of pods—how they’re scheduled or upgraded, who’s allowed to define them, how they’re replicated for scale—you have a lot of flexibility there.

How do you think Kubernetes and other cloud-native technologies are driving enterprise architecture modernization and the move to massively distributed systems?

Sponsorships Available

Kubernetes is lingua franca for how you run containerized apps. And containerization is the path for architecture modernization because it shrinks gaps between dev and prod: deployment gaps, design gaps, quality gaps, time-to-delivery gaps. Enterprises need to shrink those gaps to compete effectively. They need to focus engineer time on business value. To me, Kubernetes and related cloud-native tech is a great enabler for that refocusing.

Kubernetes solves many of the build and deploy challenges of containers. What does it leave unsolved?

Kubernetes itself does a great job of implementing a description of what containers you want running, how many, how you want to group them into services. Containers are supposed to change more than just deploy-time semantics, though. Hopefully, they reach further and further back on the development chain in your organization, eventually all the way to changing the way you design new features—favoring smaller microservices and rapid development.

So we see organizations (including ourselves) exploring different techniques to test and deliver software (progressive delivery, mirroring, experimentation in production or semi-prod). We see evolutions around quantifying system health as an input to budgeted disruption/SRE. The mindset around security is changing to account for the common pattern that Kubernetes-based apps depend on data from other services often residing outside of Kubernetes.

How does service mesh address those gaps?

I think the dataplane proxy part of service mesh is a good place to implement the functionality to support these new approaches. For instance, on the security aspect, the service mesh dataplane should implement the authentication and encryption using mutual TLS.

Above the dataplane, there’s space for a Kubernetes-style declarative control plane formed of controllers with different responsibilities. Extending the security example, a simple deployment may just want to get up and going quickly with a self-signed SPIFFE trust chain. They want to know their traffic is encrypted and authenticated to the workload. A more extensive deployment may already have certificate bundles for other internal or semi-internal services, they want fine-grained control over egress traffic. But both deployments use the same service mesh dataplane.

It seems like there’s emerging consensus on the functionality expected from the dataplane component: common TLS implementation, HTTP/gRPC layer 7 protocol support for advanced routing, circuit breaking.

There are many different tools and methods that can address things like load balancing, service discovery, canary testing and cluster security. Why pick service mesh instead of an API gateway or APM tool?

There can be some overlap with similar tech like API gateways and APM tools, but my perspective is that service mesh is especially great as a transparently injected measurement and enforcement point.

A service mesh is well-suited for service-to-service (“east-west”) traffic—they don’t have to opt in to using it, and also don’t get to opt out (subject to platform controls), which opens up security use cases around service-to-service authentication and authorization. Also, a service mesh can close the loop beyond just measurement—once you’ve identified a performance or health problem you can take action to mitigate or correct it with the service mesh. This means you see a problem, understand a problem, make a modification without having to touch many different systems.

What are some of the top use cases you are seeing service mesh used for?

First, a consistent approach to encrypting service-to-service communication that spans clusters and organization domains: mutual TLS and workload-based security. I think of it as a network security “easy button”—you get a single TLS stack with all the features you need; the operations and life cycle get much simpler (one upgrade if there’s a CVE; one config option for cert rotation, etc.).

Next, not to be overlooked, is an at-a-glance view of what services are communicating to what services, for what URLs and how healthy that is. We find users that have had fits and starts at building or buying this kind of visibility and when we show a service mesh that can grab this information everywhere without app modification, it’s like a fulfillment of a vision that they’ve always had [and] they’ve made incremental progress toward, but now they get it in one fell swoop.

Finally, all the advanced L7 routing and resiliency stuff used to build canaries and progressive delivery. Very powerful; we see a lot of users moving in this direction but not totally clear yet exactly what they want out of this layer and experimenting with different approaches, sometimes on a per-app basis. (Incidentally, that’s a good fit for service mesh dataplane as the enforcement point, with different controller layers on top.)

What do you see for the future of service mesh and Kubernetes?

First, I’ll borrow from Janet Kuo and say I hope the future of Kubernetes itself is boring. It does what it does and does it very well. Extensibility and the surrounding ecosystem is the future for Kubernetes: wrangling large clusters and large numbers of clusters, multi-tenancy, balancing hardware acceleration or isolation against universality.

For service mesh, there’s a lot to come. I think starting soon we’re going to see multiple personas interacting with one service mesh. Up until now, the focus has been on platform teams. But I think developers are going to use service mesh to help debugging, quality engineers are going to use it for fault injection and failure reproductions, API architects and release managers will rely on service mesh for novel delivery/lifecycle approaches.