Uncontained Performance, Agility: Docker Containers Inside Primary Storage
We expect containers to be transient in nature, and that includes the storage assigned to a container. However, in order for a container to be useful it needs to do some work and typically produce something of value that persists. That is to say, your container likely consumes and creates persistent data.
That is certainly the case for us. At Signature Tech Studio, we provide cloud-based workflows to the construction and architectural industry – a market that is implicitly demanding of storage resources. We maintain an AWS EC2 cloud-based infrastructure for compute and leverage Zadara Storage’s Virtual Private Storage Array (VPSA) connected to AWS resources as our primary storage. At present we run several thousand Docker jobs daily, running approximately 250 concurrent containers whose lifetimes range from 5 to 50 minutes in duration. These containers consume a lot of data and produce even more data.
Accessing all this storage is the most significant challenge in working with Docker, apart from designing the Docker containers themselves. In particular, our containers must download an initial file that can range in size from a few megabytes to almost a gigabyte. As we process the source file, we generate hundreds and thousands of new files that need to be uploaded back to the persistent data storage.
Latency free storage for all this would be awesome. Storage that is genuinely easy to migrate across vendors and platforms would be preferred, especially in a containerized cloud. Unfortunately most of us have never experienced these things given the compound latency of the server and the storage resource; couple that with inherent network latency, and storage performance is impacted, especially at scale. In the weeks since DockerCon the topic of storage for Docker has been pretty hot, with a flurry of headlines on product debuts and analyst commentaries on the challenges of providing persistent, shared, feature-rich storage for Docker containers.
When Zadara Storage announced in June that it would offer a Docker-based container solution, we saw the potential for leveraging such low-latency access. Because Zadara Container Services (ZCS), its name for this new capability, runs the containers within the storage itself, rather than running on one of our instances – it is Docker inside the storage, not running on separate hardware connected via a network.
Our Docker containers now have direct connect access to the RAID array, with all the performance impact that comes with cutting out the network latency; while our EC2 instances continue to mount the same raid array via NFS. That is exciting stuff! This means we can apply Docker to even stateful applications such as databases which require a high availability (HA) Docker container environment and high-performance persistent storage – where sharing of access across multiple containers and global policies that allow deployment of many identical containers are equally important.
To us, the benefit of this approach is extremely high performance, lower latency, reduced traffic between server and storage, and greater overall efficiency when compared to traditional approaches. It also enables guaranteed quality of service via dedicated, on-demand hardware resources for each IO Engine (storage features) and the ZCS engine.
It is worth pointing out that a container is probably not the place for a massive RDBMS – and not every application belongs in a container. While we are still early in our rollout, we have targeted 3 specific implementations where we think that using Docker within the primary storage will provide tremendous performance gains:
1) CrashPlan, our offsite backup solution, which polls the storage over NFS against hundreds of thousands or in some cases millions of files, and where even a latency of only 1 millisecond can have a performance impact, just do the math. Running CrashPlan inside of the container means we could eliminate that latency and also likely turn off an EC2 instance. Lower latency means that CrashPlan will produce these fine-grained backups at a quicker pace, ensuring our customers are protected even better than before.
2) CRON Jobs/Scheduled Tasks: We currently have a number of jobs that run at regular intervals to modify (or verify) data in the primary storage. These do everything from ensuring correct permissions on files to archiving (which means moving) old data. Historically, we have used CRON on one of our EC2 instances to perform these functions. However, this typically involves a tremendous amount of polling over NFS and ultimately the jobs run much longer than we would like for them to. Scheduled Docker tasks running inside the storage seem to be a great fit for this work.
3) iNotifier is primarily a Linux-based utility which monitors file systems and raises an event when interesting changes occur. Imagine having a container running in ZCS that has iNotifier configured to watch your RAID array and push events into a queue to be consumed by a service. Once that is in place, there is no longer a need to poll the storage system at all. Simply sit back and wait for the storage system to tell you that something interesting has happened and react to it.
What’s more, beyond the storage performance benefit, containerization enables something you rarely find – the ability for the internal IT team to develop and implement its own feature enhancements inside one specific vendor’s service. For example, when a feature comes along and our chosen vendor doesn’t support it, we could simply spin up the desired feature – for instance, RIAK CS or Eucalyptus – and put it inside Docker on the ZCS. And there we have it – a new feature, that meets our business needs, running inside our vendor’s service. We want more vendors to provide this.
Containerization in general, and Docker in specific, are making it a tremendously exciting time to be in enterprise IT. Just a year ago, we wouldn’t have imagined that containers would be allowing us to extend a storage service in such fundamental ways. Yet, by leveraging Docker as Zadara has done, we can tap into previously unimagined capabilities.
About the Author / Rob Hines
Rob “Bubba” Hines is a vice president at Signature Tech Studio, a provider of simple, innovative cloud-deployed software for the construction, architecture and reprographics industries. He has led IT teams for over 20 years.