Best of 2021 – Dual-Stack Networking in Kubernetes

As we close out 2021, we at Container Journal wanted to highlight the most popular articles of the year. Following is the nineteenth in our series of the Best of 2021.

Dual-stack networking has arrived in Kubernetes. IPv4/IPv6 dual-stack support is key to the future of Kubernetes, whether powering new 5G and edge workloads or scaling beyond today’s cluster limits to meet your future needs. Let’s dive into the cross-organizational collaboration that brought us this long-awaited goal and examine how the feature evolved over time.

Where We’ve Been

“Yes, and …” Much like an improv show, every update in Kubernetes builds on what’s come before. Yes, IPv6 has been available in Kubernetes since 1.9. Yet even with IPv6 available, dual-stack IPv4/IPv6 support remains necessary, as vast swathes of production enterprise deployments exist in a context of sedimentary layers: The new built on the old built on the positively prehistoric—but still producing value. The scarcity of IPv4 blocks is a fundamental driver for change, while the advent of 5G pushes telco boundaries to the edge where connected devices clamor for connectivity. Given these realities, it’s no surprise that 2021 is finally the year of dual-stack networking on Kubernetes. Let’s look at how we got here, what we learned along the way and what’s next.

Discussions began in 2017 and the Kubernetes enhancements issue to add IPv4/IPv6 dual-stack support was opened in April 2018 during the Kubernetes 1.11 release cycle. At KubeCon North America in late 2018, members of SIG Network determined it was time to follow through on getting dual-stack support into Kubernetes. In early 2019, coverage in the tech press was straightforward; the community was ready, and the code would follow. After vigorous discussion and months of development, the first alpha for IPv4/IPv6 dual-stack support landed in September 2019 for Kubernetes 1.16. At that point, progress on the feature depended on feedback; those who were ready to test out dual-stack services enabled the feature gate and started reporting on their experiences.

Amidst the hectic world events of early 2020, we were also discovering unforeseen issues with the dual-stack alpha. In the initial implementation, a Kubernetes service could only have a single IP family. This meant that to create a dual-stack service, you actually needed to create two services: One IPv4 and one IPv6. End users testing the functionality found this to be needlessly complex; they wanted to be able to create a Kubernetes service that was dual-stack. In hindsight, you might wonder, “Why didn’t they design it that way in the first place?”

The reasoning for the initial approach which required minimal code changes yet added complexity for end users: This first implementation required minimal changes to the Service API, so it was simpler to drop in place. The trade-off was that more complexity remained as cognitive load for the users. There wasn’t a clear “right” way to proceed at first glance; only after iteration did the friction for cluster operators show itself to be a problem worth solving. To solve that problem required a whole new approach to Kubernetes networking.

Back to the Drawing Board

By mid-2020, the momentum was there to take what we’d learned in the first alpha and apply it to a reimagined foundation for dual-stack services. “Back to the drawing board” isn’t starting over, exactly; the first version of dual-stack taught us a lot, even though most of the code was replaced in the process. The SIG Network team made the reimplementation decision together and prioritized keeping the user experience free from unwelcome surprises while supporting emerging use cases. With cloud providers and other ecosystem vendors involved, we were able to ensure that the upstream testing was conducted with sufficient resources and attention.

The new implementation changed the Service API to incorporate new fields to support dual-stack. A single ipFamily field instead became three distinct fields: ipFamilyPolicy (which could be set as SingleStack, PreferDualStack or RequireDualStack), ipFamilies (consisting of a list of families assigned) and clusterIPs (inclusive of clusterIP). This means that cluster operators can choose which IP family to use and, if using both, define the order of families used, and can do so without needing to create and run duplicate services. Under the unassuming title of “dual stack services“, Kal Henidak of Microsoft submitted an epic pull request in June 2020. After months of collaboration on the details, Tim Hockin of Google approved it in October 2020 with the line “and here we go”.

Final tally: 11,088 lines added and 3,432 lines removed from the underlying networking substrate of Kubernetes itself; no small change, commensurate with the effort thus far!

Tim and Kal onstage together at KubeCon NA 2019 

Dual-stack support was reimplemented as alpha (with updates) for Kubernetes 1.20 and released in December 2020 with accompanying documentation. A cluster operator can assign both IPv4 and IPv6 service cluster IP addresses to a single service and can transition a service between single and dual IP stacks. Because this codebase had been under keen scrutiny for months before the alpha release, we were ready to progress to beta for Kubernetes 1.21.

Significant changes in any mature project require careful documentation; in February 2021, the Production Readiness Review was a key component of the move to beta. “What happens when…?” is a question we all need to ask ourselves for various scenarios (hopefully before using a new feature). Troubleshooting any failures of cluster networking is a complex topic with much nuance and error conditions to consider; host networking and CNI are important factors in the rollout of any production dual-stack networking configuration. 

Beta features in Kubernetes are on by default, but what’s more, the underlying networking infrastructure is changed whether or not the feature gate is on. This means that after the 1.21 release in April 2021, the dual-stack code changes started being tested for any side effects. One potential show-stopper of a bug turned up in the PreferDualStack code, where an auto-upgrade could surprise users and lead to a disconnect between configuration and expectations. Subtle bugs like this don’t lend themselves to simple solutions; the resolution ended up requiring an explicit opt-in (and after discussion, the fix was backported to earlier versions of Kubernetes with that code).

It’s worth remembering why the Kubernetes community embarked on this journey in the first place: IP addresses are not infinite; public IPv4 addresses have been largely exhausted, and even private IPv4 networks run low on addresses in the flat address topology of a Kubernetes cluster. Dual-stack support removes such scaling limitations via native IPv6 routing to pods and services, whilst still allowing a cluster to use IPv4 as needed. IPv6 is the future and dual-stack is the bridge to get us there. 

Dual-stack services are on track to graduate soon as a stable feature for upstream Kubernetes, in release 1.23 or thereabouts. What does this mean in practice? We’re opening up the ecosystem for networking providers who’ve supported dual-stack for a while before it was available in Kubernetes. Managed providers will implement this feature as they roll out support for the underlying networking, and SIG Network will continue iterating on many related improvements. Software is never “done”, but “stable” is a great foundation for future enhancements. Soon enough, we’ll be saying, “Yes, and…” to the next round of Kubernetes improvements, and you can riff on our ideas and bring your own.

To hear more about cloud-native topics, join the Cloud Native Computing Foundation and the cloud-native community at KubeCon+CloudNativeCon North America 2021 – October 11-15, 2021

Bridget Kromhout

Bridget Kromhout is a Principal Program Manager at Microsoft Azure, focusing on the open source cloud-native ecosystem. Her CS degree emphasis was in theory, but she now deals with the concrete (if 'cloud' can be considered tangible). After years on call for production (from enterprise to research to startups) and a couple of customer-facing adventures, she now herds cats and wrangles docs on the product side of engineering. In the wider tech community, she has done much conference speaking and organizing, and advises the global devopsdays organization after leading it for over five years. Living in Minneapolis, she enjoys snowshoeing in the winter and bicycling in the summer (with winter cycling as a stretch goal).

Bridget Kromhout has 1 posts and counting. See all posts by Bridget Kromhout