We’re seeing container deployment of database management systems (DBMS) starting to take off, leaving the hype phase and going mainstream. One indication that containerized databases are trending is that Postgres, a well-known open source relational database, has been ranked in surveys as the third most popular technology run in Docker containers.
By abstracting applications completely from the operating system and infrastructure layers, containers provide application portability and unprecedented agility and flexibility to support a DevOps approach of continuous integration and continuous delivery and/or continuous deployment (CI/CD). In addition, container images launch much faster than virtual machines (VM), making them more suitable for today’s dynamic runtime environments, in which apps are expected to scale up and down on-demand.
Along with other containerized applications, containerized databases have emerged as part of the paradigm shift from large, monolithic applications to applications based on microservices and serverless architectures. Due to the ease with which containerized databases can be deployed, they have become an on-demand utility for individual applications rather than having a large centralized database that serves multiple applications.
A containerized database is an encapsulation of its DBMS server software, with access to a physical database file residing somewhere within the network. Each DBMS is encased in its own container image. Containerizing a database, however, is not quite as straightforward as containerizing an application.
What You Need to Know
Here are 11 things to know about databases and Postgres in containers, starting with some of the advantages of containerized databases.
- With containers, you can approach the database as an on-demand utility, which means that each application can have its own dedicated database that can be spun up as needed. This overcomes the disadvantages of large, monolithic databases with a microservices architecture supported by smaller, containerized databases.
- Containerized databases separate storage from compute, meaning storage performance and capacity can be scaled independently of compute resources. This provides more flexibility in upfront database capacity planning and provisioning, since changes are much easier to make later.
- Software-defined containerized databases provide a crucial missing link in high-velocity DevOps cycles, allowing development and operations teams to collaborate seamlessly. At the same time, however, containerized databases have a unique set of challenges in terms of high data availability, backup and recovery, and other critical database performance and compliance requirements.
So, now let’s add to our list of things to know with some of the often-cited database containerization challenges, along with some ways to deal with those.
- Databases typically require high-throughput, low-latency networking. However, Docker containers do not natively provide the level of storage and network resource isolation that is necessary to achieve these requirements. The emergence of containers orchestration such as Kubernetes, for example, resolves this by managing networking and data storage, which can be local or in the cloud.
- Databases are inherently stateful and enduring, while containers are typically stateless and ephemeral. The workarounds put into place to handle persistent data storage and longer-than-usual container lifespans often detract from the key container benefit of reduced runtime resource usage.
To handle this, it is necessary to plan for persistent storage by separating the database engine from the database files storage. This way, if a container goes down or fails for some reason there is no data loss. This is the same design structure used for a DBMS deployed in the data center.
- The considerable disk space required to store large amounts of data in a containerized database makes it less agile and less relocatable. The solution is the same as mentioned above, which is to separate the database engine from the database files storage by mapping the external data volumes into the container at runtime. When using Kubernetes, persistent volumes can be created using different storage backends that include NFS, GlusterFS, Ceph, as well as cloud-backed storage (AWS EBS).
- Databases typically have numerous tuning parameters, many of which are dynamic. Building a new immutable container image for every possible database configuration can quickly result in image sprawl. It should be noted, however, that this issue is even more of a challenge in VM deployments, since containers are considerably more lightweight than VMs. To avoid this, custom database configurations are passed into the container at runtime that will override the default configurations. In Kubernetes, this can be achieved through the use of ConfigMaps, which is a Kubernetes object that encapsulates the custom configurations and is provided to the container at deployment time.
- Databases are critical to application workloads so high availability and failover are needed for production workloads. By comparison, in a development environment it is simpler because there is no need to deploy a replica and local storage would be simpler to manage compared to shared storage.
For databases supporting production workloads, it’s important to eliminate the single point-of-failure. Using the example of PostgreSQL, you will need to develop scripts to create the failover capability so that a replica automatically takes over in the event the master database fails. When the automated scripted deployment of containerized databases is used in conjunction with an orchestration framework such as Kubernetes, the result is built-in high availability for failover scenarios. This negates the need to maintain a failover cluster replica, which saves on resources that will be idle much of the time.
The other option is to choose something like EDB Failover Manager (EFM) to do the work for you providing high availability and failover. This component is integrated into the Postgres database container to provide automatic failover in the event of a database or a node (VM) failure.
- A load balancer is needed to scale and deliver high performance for production workloads where there are lots of users executing many queries simultaneously. In the case of PostgreSQL, there is pgPool available, which helps applications scale by load balancing read transactions to the replicas while directing write requests to the master. Or, EDB Postgres Containers comes with this function, as another option, so load balancing is built-in.
- Backup and recovery are critical functions for any database, whether containerized or not. The EDB Backup and Recovery Tool (BART) container provides backup and recovery capabilities for Postgres database containers and can support databases in multiple different containers. It implements scheduling for automated backups, retention policies, compression options for backups as well as point-in-time recovery requirements for large-scale Postgres deployments.
- Once databases are deployed, they need to be monitored for resource usage and also analyzed to identify any performance bottlenecks. The EDB Postgres Enterprise Manager (PEM) provides monitoring and performance diagnostics in addition to other database administration functions.
In summary, containerization means being able to operate the same software in the same way across multiple public and private clouds, and even in on-premises virtualized environments. Containers provide this application portability—with unprecedented agility and flexibility to support a DevOps approach of continuous development/deployment/integration—by abstracting apps completely from the operating system and infrastructure layers.
Today, containers and container orchestration have matured to the point that they are now positioned at the very core of cloud-native initiatives. Databases are becoming popular candidates for containerization, becoming an on-demand utility as part of the shift toward microservices and serverless architectures.