MapR Extends Big Data Reach to Docker Containers

February 15, 2017February 14, 2017 Mike Vizard big data, containers, data migration, docker, hadoop, MapR, MapR Persistent Client Container

by Mike Vizard

For the better part of a year now, MapR Technologies has been making a case for deploying a converged data platform based on Hadoop that has been optimized for microservices. Now the company is extending that platform to add support for Docker containers.

Jack Norris, senior vice president for data and applications at MapR, says the Converged Data platform provides a means to create stateful applications using Docker containers at a higher level of abstraction without having to be concerned about whether the underlying physical storage systems support Docker containers.

That approach, he says, makes it much easier for Docker applications to share access to a common pool of data with a variety of big data applications. It also goes a long way to simplifying some of the challenges associated with employing Docker applications that require access to large volumes of data, he adds.

At the core of the Converged Data Platform is the MapR Persistent Client Container, which provides a pre-built Docker container to provide access to files, database tables and streams. Available via DockerHub or as a Dockerfile, the Converged Data Platform extends authentication securely all the way down to an individual container.

Developers of container applications increasingly are finding that Hadoop has emerged as a de facto standard for building data lakes inside organizations. While containers historically have tended to be ephemeral in nature, use cases for containers are expanding to include stateful applications, many of which require access to analytics in real time. That requires integration into a larger data fabric where much of the data has already been partially processed. In many cases, the developer of a container application is merely looking to reuse that data within another application.

Sponsorships Available

As the use cases for containers continue to expand, so, too, will the ways to approach how they get deployed and managed. IT organizations must consider the degree to which they want to move data between various clusters versus bringing containers to a platform where much of their data already resides. Most IT organizations will wind up pursuing both approaches as their individual circumstances warrant. But it’s unlikely that most container applications will end up running on isolated clusters. Containers can run almost anywhere, and many IT organizations will be wary of increasing technical debt by moving data from where it already resides into another platform.

In addition, moving data from one platform to another increases the overall size of the attack surface that must be defended within the enterprise. In fact, because of those issues, many IT organizations perceive moving data to be the root of all evil within the enterprise.

Of course, politics and inertia being what they are, most organizations do move their data for one reason or another. But many IT organizations would prefer to bring containers to their data rather than having to go to the expense of moving data to the container.