Kubeflow Pipelines with Tekton and Watson

Create your pipeline using Kubeflow Pipelines DSL and compile it to Tekton YAML.

By Animesh Singh, Chief Architect, Data and AI Open Source, IBM

Trent Gray-Donald, Distinguished Engineer, IBM Data and AI  

More machine learning models need to be deployed in production in a faster, repeatable, and consistent manner — and with the right governance.

According to Forrester, “A top complaint of data science, application development and delivery (AD&D) teams, and, increasingly, line-of-business leaders is the challenge in deploying, monitoring, and governing machine learning models in production. Manual handoffs, frantic monitoring, and loose governance prevent organizations from deploying more AI use cases.”

In March 2018, IBM VP Dinesh Nirmal gave a quotation at the Strata Data conference that has been echoed many timesin the machine learning echelon.

MLOps and Kubeflow Pipelines speed deployment

To solve this problem, data scientists, data engineers, and DevOps folks worked together to move this discipline forward with the rigor of engineering as opposed to science. The MLOps and DataOps domains rose based on this demand and need, and consequently data and machine learning pipelines became a primary vehicle to drive these domains.

Kubeflow became a leading solution to address MLOps needs. Kubeflow is an end-to-end machine learning platform that is focused on distributed training, hyperparameter optimization, production model serving and management, and machine learning pipelines with metadata and lineage tracking.

Needless to say, Kubeflow Pipelines became a primary vehicle to address the needs of both DevOps engineers and data scientists.

  • For DevOps folks, Kubeflow taps into the Kubernetes ecosystem, leveraging its scalability and containerization principles
  • For Data Scientists, Kubeflow Pipelines offers a Python interface to define and deploy Pipelines, enabling metadata collection and lineage tracking
  • For DataOps folks, Kubeflow Pipelines brings in ETL bindings to participate more fully in collaboration with peers by providing support for multiple ETL components and use cases.

With IBM’s standardization on Kubernetes and leadership in the Kubeflow community, IBM has adopted Kubeflow Pipelines as the natural choice to define our strategy for end-to-end data and machine learning pipelines.

Announcing: Kubeflow Pipelines on Tekton

The decision to adopt Kubeflow Pipelines on our side came with an internal requirement to redesign Kubeflow Pipelines to run on top of Tekton (a Kubernetes-native CI/CD engine) instead of Argo. Tekton provides Kubernetes-style resources for declaring CI/CD-style pipelines, and introduces several new Custom Resource Definitions(CRDs) including Task, Pipeline, TaskRun, and PipelineRun. Within IBM, we have standardized on Tekton as a cloud CI/CD engine, and OpenShift Pipelines is based on Tekton. Additionally, Tasks in Tekton can be managed and executed independently of pipelines, which is valuable for us.

Given the strategic and technical alignment with Tekton, it was a natural fit for our team to rewrite and run Kubeflow Pipelines on top of Tekton. As we set out to design and execute this work, we got support through Google, CD Foundation MLOps Sig, and Red Hat. After an extensive effort, we have Kubeflow Pipelines running on Tekton end-to-end and available in open source. Read our design documentto understand our process and requirements.

With this milestone, you can:

  • Create your pipeline using Kubeflow Pipelines DSL and compile it to Tekton YAML.
  • Upload the compiled Tekton YAML to the Kubeflow Pipeline engine (API and UI), and run end-to-end with logging and artifacts tracking enabled.

For more details about the project, please look at these slidesand the deep dive presentation.

If you are using Kubeflow Pipelines in the community with the open source codebase, your experience remains the same! You can use:

Python SDK published by us to compile your Pipelines Python DSLs

On the Kubeflow Pipelines UI, you get the same KFP DAG experience, backed by Tekton YAML

Additionally, real-time log streaming, artifact tracking, and lineage tracking work fully as needed.

Kubeflow Pipelines with Tekton on Red Hat OpenShift

OpenShift Pipelines provides a cloud-native CI/CD on OpenShift, automating build, test, and deployment of applications across on-premises and public cloud platforms. Follow the instructions at Deploy Kubeflow Pipelines with Tekton backend on OpenShift Container Platformto see how to run Kubeflow Pipelines with Tekton on OpenShift. Depending on your situation, you can choose between the two approaches:

1. Leverage OpenShift Pipelines (built on Tekton).
2. Install Tekton as part of a deployment.

The next step for us is to integrate Kubelflow Pipelines with Tekton into Red Hat’s OpenDataHubproject given its charter as an open source AI/ML platform on OpenShift.

Launching soon: Watson AI Platform Pipelines

Taking this further, we are launching Watson AI Platform Pipelines, end-to-end machine learning pipelines, on this stack. We are adding additional features to make it easier to build pipelines with drag-and-drop canvas, pipeline components registry, production-grade logging and governance capabilities by virtue of integration with IBM Watson Studio, Watson AutoAI, Watson Knowledge Catalog, Notebooks, and others. Stay tuned!

Join us to build cloud-native machine learning pipelines with Kubeflow Pipelines and Tekton

Please join us on the Kubeflow Pipelines with Tekton GitHub repository, try it out, give feedback, and raise issues. Additionally you can connect with us via the following:

  • To contribute and build an enterprise-grade, end-to-end machine learning platform on OpenShift and Kubernetes, please join the Kubeflow communityand reach out with any questions, comments, and feedback!
  • If you want help deploying and managing Kubeflow on your on-premise Kubernetes platform, OpenShift, or on IBM Cloud, please connect with us.
  • To run Notebook-based pipelines using a drag-and-drop canvas, please check out the Elyra project in the community, which provides AI-centric extensions to JupyterLab.
  • Check out the OpenDataHubif you are interested in open source projects in the Data and AI portfolio, namely Kubeflow, Kafka, Hive, Hue, and Spark, and how to bring them together in a cloud-native way.

Thanks to contributors to Kubeflow Pipelines with Tekton, namely Tommy Li, Christian Kadner, Rafal Bigaj, Andrea Fritolli, Priti Desai, Feng Li, Andrew Butler and others for contributing to the various aspects of the project, both internally and externally. Additionally, thanks to Pavel Dournov, Jeremy Lewi, and the Kubeflow Pipelines team from Google for helping getting this shaped up. Last but not the least, thanks to OpenShift Pipelines and Tekton teams from RedHat for their support.