Arguably the two biggest technology advances in software in recent memory have been the arrival of Hadoop and the rise of container such as Docker. MapR Technologies today moved to bring those two technologies together in a way the promises to significantly ease the building and ongoing management of Big Data applications.
With the addition of support for the Apache Myriad project, MapR Technologies is moving to converge the way resources are managed on its distribution of Hadoop and the Apache Mesos data center operating system project. The goal is to make it possible to run stateful container applications on top of Hadoop that can be managed using a framework that is widely be adopted to manage container applications running on other platforms.
Specifically, Apache Myriad is an open source Hadoop project that lets YARN applications run alongside Apache Mesos frameworks. That means YARN can be used to request Mesos resources without any modification. The support for stateful container applications is provided via MapR POSIX Client that presents Docker containers with a read-write file system.
Jack Norris, senior vice president for data and applications says, providing this capability is a natural extension of the MapR Converged Data Platform that is based on a Zeta Architecture that MapR developed around the core Hadoop project to globally manage all resources. Rather than having to set up separate instances of Hadoop clusters to run different types of workloads, Norris says IT organizations can make use of the MapR Converged Data Platform to enable files, database tables and streaming analytics to be accessed on the same Hadoop cluster.
Because there is no hard link between the data stored in Hadoop and the way it needs to be accessed, MapR doesn’t require IT organizations to set up separate clusters that not only have to be provisioned and deployed, but also continuously synchronized. The end result of using other distributions is not only the acquisition of more hardware, but also a lot more data management complexity.
In addition, standard Docker containers make use of data volumes tied to an individual server, which means that if a container fails, or is moved from one server to another, its connection with the data volume is lost. As such, containers are not designed to be inherently persistent, which creates a challenge for developers that MapR is now trying to address with this update to its Hadoop platform.
Norris acknowledges that not as many IT organizations have recognized that difference early on. But as more Hadoop applications head into production environments Norris says IT operations teams in particular will appreciate how fundamentally more efficient the MapR distribution of Hadoop is. To drive that point home Norris cites Novartis Institutes for Biomedical Research as an example of a customer that can expect a three-year return on investment (ROI) of 382 percent and generate $26.6 million annually on average over three years in incremental revenue.
It’s too early to say how the shift to containers and microservices might realign the dominant Hadoop platform players. But the one thing that is for certain is the combination of the two will most definitely be greater than the sum of the proverbial parts.