BlueData Launches Open Source Kubernetes Storage Project

July 24, 2018July 24, 2018 Mike Vizard

BlueData has launched an open source project that seeks to make it easier to deploy big data and artificial intelligence (AI) application workloads on top of Kubernetes.

Tom Phelan, chief Architect of BlueData, says the BlueK8s project is based on container technologies the company developed originally to accelerate the deployment of big data based on Hadoop and Apache Spark software. Since then, BlueData has replaced a proprietary container orchestration engine it developed with an open source Kubernetes engine. The company’s first open source project in the BlueK8s initiative is Kubernetes Director (KubeDirector), which makes it easier to deploy and manage distributed stateful applications using Kubernetes, says Phelan.

BlueData, which has also become a member of the Cloud Native Computing Foundation (CNCF), wants to increase the size of the Kubernetes community specifically focused on stateful applications and persistent storage, Phelan says. There are already a number of persistent storage projects in the Kubernetes community that BlueData is hoping to help advance because big data and AI applications are stateful by definition, he says.

Rather than continuing to develop its own container orchestration platform, Phelan says BlueData is trying to take advantage of the massive amount of resources and expertise being poured into Kubernetes, an approach he says will enable BlueData to allocate more of its resources higher up the software stack.

In general, data scientists typically don’t have much expertise when it comes to deploying applications on IT infrastructure. BlueData shields them from that complexity by providing a layer of abstraction that make it simpler to deploy large-scale applications on containers, Phelan says.

Sponsorships Available

There’s currently a debate going on among data scientists concerning to what degree to continue to invest in big data applications based on Apache Spark versus concentrating their efforts on applications based on machine and deep learning algorithms. By embracing Kubernetes, BlueData is making it easier for IT organizations to lift and shift existing big data applications into the cloud using container technologies, where they can run alongside AI models that can also be deployed using an instance of Kubernetes that is part of the BlueK8s project, he says.

Support for Kubernetes also provides that added benefit of making it possible to integrate the deployment of big data and AI applications within the context of an integrated DevOps process, Phelan notes.

It’s increasingly clear that container and AI technologies are being joined at the hip. Ultimately, IT organizations will want to be able to apply algorithms where ever a large quantity of data resides. Today that generally means relying on a public cloud to inexpensively store massive amounts of data. But in the future, there will also be a need to apply AI models at the edge of the network to eliminate the need to transport massive amounts of data across an extended network.

In the meantime, IT organizations are increasingly viewing Kubernetes as means for distributing both algorithms and the applications that employ them almost anywhere. As that process occurs, there may even come a day when the whole notion of cloud computing dissipates because every application by default can be deployed across a hybrid IT environment.