A new data-centric approach to building robust MLOps practices
Working with Databricks, we have helped thousands of customers put Machine Learning (ML) into production.
Before working with us, many customers struggled to put ML into production—for a good reason: Machine Learning Operations (MLOps) is challenging. MLOps involves jointly managing code (DevOps), data (DataOps), and models (ModelOps) in their journey towards production.
The most common and painful challenge we have seen is a gap between data and ML, often split across poorly connected tools and teams.
To solve this challenge, we build upon the Lakehouse architecture to extend its key benefits—simplicity and openness—to MLOps.
Databricks platform simplifies ML by defining a data-centric workflow that unifies best practices from DevOps, DataOps, and ModelOps. Machine learning pipelines are ultimately data pipelines, where data flows through the hands of several personas.
Data engineers ingest and prepare data; data scientists build models from data; ML engineers monitor model metrics; and business analysts examine predictions. We simplify production machine learning by enabling these data teams to collaborate and manage this abundance of data on a single platform, instead of silos.
For example, our Feature Store allows you to productionize your models and features jointly: data scientists create models that are “aware” of what features they need so that ML engineers can deploy models with simpler processes.
Our approach to MLOps is built on open industry-wide standards. For DevOps, we integrate with Git and CI/CD tools. For DataOps, we build upon Delta Lake and the lakehouse, the de facto architecture for open and performant data processing.
For ModelOps, we build upon MLflow, the most popular open-source tool for model management. This foundation in open formats and APIs allows our customers to adapt our platform to their diverse requirements.
For example, customers who centralize model management around our MLflow offering may use our built-in model serving or other solutions, depending on their needs.
We discuss the challenges of joint DevOps + DataOps + ModelOps, overview our solution, and describe our reference architecture.

MLOps is a set of processes and automation to manage code, data, and models to meet the two goals of stable performance and long-term efficiency in ML systems. In short, MLOps = DevOps + DataOps + ModelOps.
Leave A Comment