Is There a Data Layer Dilemma Between Containers & Big Data Applications?

According to Krishna Yeddanapudi, CTO & Founder, Robin Systems, when an enterprise uses containers to deploy existing applications, container technology fails to address how to manage data beyond the lifecycle of the container itself. Though the enterprise will dispense with containers and create them anew time and again, the data that they isolate and serve must endure beyond that lifecycle.

“The Robin Systems data layer provides a container-aware data access layer that provides high performance for the containers while solving the data persistence issue. Now the container can reattach to its data volumes when physical node failures occur,” says Yeddanapudi.

Distributed applications experience certain stateful conditions and need to scale out, which leads to commonly occurring physical node failures. Data management is an important issue in this mix. Robin Systems solves this issue so that containers can deploy distributed stateful applications that can scale while data remains constant.

“The data layer extends the utility of container technologies from single-node stateless applications to distributed stateful applications like Hadoop, Cassandra, MondoDB, and Elastic,” says Yeddanapudi.

Virtual Clusters, QoS

“We supply custom Big Data applications / application clusters for users as distributor applications in the form of a virtual cluster. Users can deploy virtual clusters with the appropriate data layer and quality of service,” says Yeddanapudi.

Robin Systems’ clusters are virtual clusters in that users can create virtualized clusters of a group of containers. “We create this group of containers in appropriate machines with appropriate fault domains. That is, if one machine goes down, we don’t want two containers sitting on the same machine. So we create them on different machines, and they are stitched together forming the virtual cluster. That is a distributed application,” says Yeddanapudi.

Containers enable performance controls that limit individual allotments for compute, networking, and storage performance on a per container (and so) per app basis on a given host to provide QoS.

“Now infrastructure operators can take remedial action in response to user performance complaints by curtailing resources that greedy applications use up at the expense of others,” says Yeddanapudi.

Joining Containers for Big Data Applications

It is impractical for containers to run only independently without cooperating with each other when serving the needs of Big Data applications. Big Data applications rely on components such that a Hadoop cluster for example will have name nodes, data nodes, a Zookeeper server, an HBase region server, an Oozie  server, a YARN Resource Manager, a Hive server, and so on, Yeddanapudi explains.

“Many of these services operate in conjunction with each other and need to know of each other’s existence. So it makes sense to stitch these cooperating entities together into a single abstraction. We call this abstraction a virtual cluster,” says Yeddanapudi.

Other container solutions do not typically join containers in a manner that enables them to create clusters for big data applications in this way. “Most
container deployments generally use single-node applications. What I mean by single-node application is applications running on a single computer system,” says Yeddanapudi. Robin Systems’ distributor applications connect multiple containers to work together.

Comparison to Traditional Distributed Computing Benefits

Before the industry arrived at container technology as it is today, one of the benefits of distributed systems was greater utilization of individual computers so that we could bring everything up to the highest possible utilization. Robin Systems achieves similar benefits with containers.

“When it comes to utilization, most other technologies provide the realization of better utilization of computers. We, on the other hand, offer the better utilization of compute,” says Yeddanapudi. Robin Systems offers compute-storage separation so that compute and storage can scale independent of each other.

Robin Systems maintains a large catalog of applications that are available to work with its distributed container clusters including Hadoop (Cloudera, Hortonworks), MongoDB, Cassandra, Ceph, Elastic, Kafka, Spark, and MySQL.

David Geer

David Geer’s work has appeared in ScientificAmerican, The Economist Technology Quarterly, CSO & CSOonline, FierceMarkets, TechTarget, InformationWeek, Computerworld, Byte.com, ITWorld.com, IEEE Computer Society’s Computer magazine, IEEE Distributed Systems Online, Government Security News, Laptop, Smart Computing, Technical Support, The Hosting Standard (Canada), TechWorld.com (UK), SIGnature, Processor, and the Engineering News-Record. David served as a technician at CoreComm in Cleveland, OH prior venturing into writing.

David Geer has 24 posts and counting. See all posts by David Geer