Cloudera today announced that it plans to make an instance of its data management platform based on Hadoop generally available this summer on Red Hat OpenShift, which is based on Kubernetes.
Arun Murthy, chief product officer for Cloudera, says Cloudera Data Platform (CDP) Private Cloud is a complement to the instances of the platform already available on Amazon Web Services (AWS) and Microsoft.
The goal is to enable IT teams to deploy a data warehouse based on CDP in the cloud or on-premises IT environments and move data across a hybrid cloud computing environment, says Murthy.
Thanks to the rise of Kubernetes it’s now easier to move workloads between cloud computing environments. However, moving data between cloud platforms has been more problematic. CDP simplifies the movement of data between cloud platforms, enabling IT teams to preserve metadata as well as the relevant security and governance controls that should be maintained, notes Murthy.
That’s critical because in the wake of the economic downturn brought on by the COVID-19 pandemic, many IT organizations are looking to centralize the management of multiple clouds to reduce the total cost, he adds.
Murthy says Cloudera will add support for other distributions of Kubernetes based on demand, noting Red Hat OpenShift currently is the dominant distribution of Kubernetes being deployed in on-premises IT environments.
CDP is based on two distributions of Hadoop coming together as a result of the Cloudera-Hortonworks merger at the beginning of last year. Since then, Hadoop and Kubernetes have played key roles in driving development of artificial intelligence applications incorporating machine and deep learning algorithms. Hadoop provides a means to manage massive amounts of data, while the containers orchestrated by Kubernetes make it possible to employ microservices to build and deploy what would otherwise be an unwieldy monolithic AI application.
Of course, as the amount of data being aggregated reaches into the petabytes, the term “big data” has become somewhat passé. The issue is not so much the amount of data being stored and processes as much as it is making sure the right data is being made available to the right microservice at the right time. In effect, sets of data need to be managed as a logical entity that can be accessed by multiple microservices, notes Murthy.
Cloudera, in fact, already makes available separate data warehouse, machine learning and data management and analytics services on top of CDP to simplify the management of data within the context of specific use cases.
It may be a while before IT teams master all the nuances of data management in a hybrid cloud computing era enabled by Kubernetes. However, as organizations seek to derive more value from the data they collect, they will need more flexible approaches to managing massive amounts of data. With the rise of agile development methodologies and DevOps, it’s never been easier to create and deploy an application. By comparison, giving those applications access to the data they require remains positively glacial in far too many organizations.