Database systems traditionally have relied on rather complex, OS-specific clustering technology to provide high availability. Depending on vendor and configuration, these technologies included shared storage devices, transaction-based replication, storage replication or some combination thereof. In many cases, these configurations required complex storage and network configuration supported by multiple teams at significant cost.
As the use of virtualization became more widespread, virtual machines offered a degree of high availability, and many hypervisors offered some form of disaster recovery provided by storage replication. The database files were contained within a virtual file format, which made it easy and cheaper to implement; however, since the hypervisor was not aware of the transactional state of the database, it meant failing over could return a database in an inconsistent state.
The vast majority of early container deployments were web and application servers. This was for a couple of reasons, the biggest being the challenges of data persistence in an ephemeral container environment. When Google introduced Kubernetes, databases got both consistent storage and namespaces. This provided a couple of major benefits: It introduced high availability into the solution and separated the storage tier from the compute plane. These factors enabled administrators to have a high degree of flexibility in changing the compute resources for the database tier.
Containers offer many benefits over virtual machine deployment, since they run closer to the system and share operating system binaries. The first of these benefits is performance. Compared to a virtual machine, a database container will have better I/O response time. Secondly, containers allow for simplified deployment using their infrastructure as code model and higher density over virtual machine solutions. Also, since all of the containers on a given host share the same base operating system, the amount of patching is reduced.
Another major benefit is the ability to containerize a gold image or use a vendor-supplied image of an RDBMS. While an RDBMS such as MySQL has a very simple installation process, Oracle and SQL Server both require a fairly complex install, patching and post-installation configuration.
While most of the initial support for database containers was in the open source community for relational databases such as Postgres and MySQL and NoSQL solutions such as MongoDB, Redis and others, commercial database vendors including Oracle, Microsoft and IBM have introduced support for both Docker and Kubernetes in recent years. Kubernetes, via its persistent volumes and service configuration, allows for these databases to be highly available and perform actions such as rolling upgrades for RDBMS patches. Separating data and compute also allows for easier DevOps deployment workflows because of fewer moving pieces.
While implementing a new platform such as Kubernetes can be challenging for many enterprises, the “container as a service” offerings in the major public clouds (Amazon, Google and Microsoft) have made it easier to get started. There are still some challenges around maturity, security and integration with existing third-party applications.
Microsoft has made a big investment into SQL Server and the Kubernetes platform, recently introducing SQL Server 2019 at the Ignite conference. This release includes a new group of services called SQL Server Big Data Clusters, which combine a SQL Server database engine with Spark for machine learning and flexible compute and data pools that take advantage of the underlying Kubernetes infrastructure. This kind of commitment to the platform from a major vendor shows both its early maturity and significant future.
It will be several years before the deployment platform of choice for database servers is containers, and the biggest remaining challenge is disaster recovery. While the Kubernetes platform itself is highly available within a given data center, disaster recovery still requires the database platform to supply a mechanism to perform replication. Some platforms already support this, and I would envision more support to come from all the major vendors in the next year. The advantages of containers are very clear both from DevOps deployment scenarios as well as simplified administration, and this will ultimately drive more database workloads in this direction.