Cassandra Kubernetes SIG Picks Cass Operator for K8s

The Cassandra Kubernetes special interest group (SIG) has coalesced around the Cass Operator project as the community-based operator.

Moving toward a single operator for the Apache Cassandra community has been a technical challenge. There are several Kubernetes operator projects for Cassandra, and there were at least five different ways to approach this challenge. (You can read about the five major Kubernetes operators for Cassandra in this article from last September.) Initially, it seemed likely that the community would create a standard and build a fresh operator from scratch; one that incorporated the best ideas into a single product. For more details on this discussion, check out this lengthy dev mailing list thread.

However, instead, the SIG is focusing on increasing Cass Operator’s community adoption with the ultimate goal of bringing the project into the Apache Software Foundation (ASF).

Why Cass Operator?

Several features of the Cass Operator project, open sourced by DataStax, made it the prime candidate for the other projects to rally around.

 

Apache Cass K8s Operator
High-level architecture of the Cass Operator in Kubernetes.

Cass Operator has major features for data center provisioning and operations, and has Apache Cassandra’s best practices baked in to the automations:

  • Bootstraps nodes appropriately – This feature is important because when Cassandra starts up, it needs to start the initial seeds first, in each rack, in a uniform manner.
  • Scales clusters up and down gracefully – Nodes are intelligently scaled up and down, one at a time, across racks so that replicas of data are uniformly distributed.
  • Automated node recovery processes – Basic operations processes such as restart, replace node, or replace an instance are all automated.
  • Basic topology – This feature makes multiple data center/multi-rack clusters easy to create.
  • Advanced topology – Advanced networking at the Kubernetes layer makes multi-region/multi-Kubernetes clusters possible with container network interfaces (CNIs) such as Cilium or Project Calico, or externally via traditional networking tools.
  • Customizable containers – By applying containerization best practices, this enables operators to merge containers they have built with what is offered in the operator so that they don’t have to deal with secrets/volumes in Kubernetes.

Apache Cass K8s Operator

An Apache Cassandra cluster managed by Cass Operator in Kubernetes across different workers. StatefulSets manage the pods running Cassandra.

Project Differentiators

Cass Operator has many general features that distinguish it even before it is merged with the powerful features that CassKop will supply:

  • The operator leverages a number of existing open source projects in the OSS ecosystem and commercial components that avoid issues with vendor lock-in:
    • Open source Cass Config Builder extracted from DataStax OpsCenter Life Cycle Manager.
    • Open source management API for Apache Cassandra (MAAC).
    • Open source metrics collector for Apache Cassandra (MCAC).
    • Open source SRE tools such as Prometheus and Grafana Operator.
  • PodTemplateSpec enables operators to super-customize existing pods.
  • Cass Operator implements advanced networking and manages the node ports and host networks.
  • Management API mTLS support provides simple security.
  • Automated generation of keystore and truststore for internode and client to node TLS.
  • Automated superuser account configuration according to best practices.
  • NetworkTopologyStrategy is automatically applied with appropriate RF for system keyspaces.
  • Webhook validation ensures that invalid changes are rejected with a helpful message.
  • Rolling cluster updates that allow for changes related to a change in binary (C* upgrade), a change in configuration, and canary deployments – single rack application of changes for validation before broader deployment.
  • Operator certification and thorough testing on several platforms, including Azure AKS, Amazon EKS, Google GKE, Red Hat OpenShift and VMware Tanzu Kubernetes.
  • Well-documented cloud storage classes, ingress solutions and reference implementations with an example application using the Java driver.
  • Super-useful cluster-level stop/resume, which stops all running instances while keeping persistent storage. This feature allows for scaling compute down to zero while bringing the cluster back up following the expected Cassandra startup processes.
  • CassKop operator features that are being merged.

There are features in the CassKop operator, open sourced by Orange Telecom, which have been added or are due to be merged/committed into the Cass Operator project:

  • Node labeling to map any internal architecture, including network-specific labels to help with multiple data center setup, was released in 1.6.0.
  • Volumes and sidecar management (which could be linked to PodTemplateSpec) were also included in 1.6.0.
  • Kubectl plugin integration, which is useful on the ops side without an admin UI.
  • MultiCassKop evolution to drive multiple Cass Operators clusters instead of multiple CassKops clusters. (Note: This may remain Orange internal if too specific)

There are currently no plans to integrate backup and restore functionality directly, but there will be provisions for backup and restore through K8ssandra, which layers additional functionality on top of the operator.

As you can see, there’s lots of features and functionality in development for the Apache Cassandra project so that it relates well to Kubernetes. We expect to have a publicly available roadmap soon. Anyone can join us for the next Cassandra Kubernetes SIG meeting or on the Apache Software Foundation’s Slack team by joining the Cassandra-Kubernetes channel. Great things come from great minds that are part of a community!

Rahul Singh

Rahul Singh is a contributor to the Apache Cassandra project and works at Anant Corporation as a business platform architect for data and analytics. His work is focused on real-time business platforms that connect data and analytics with customer experience and information systems that use Cassandra, Spark, Kafka, Akka.

Rahul Singh has 2 posts and counting. See all posts by Rahul Singh