Add Blog Subtitle's.
Traditionally, machine learning has been a domain reserved for academics who have the mathematical skills to develop complex ML models but lack the software engineering skills required to produce these models.
It may come as a surprise to young data scientists to know that only a tiny part of the code of many ML systems is actually dedicated to learning or prediction.
In order to build good machine learning products, it’s often a matter of making machine learning the great engineer that you are, rather than the great machine learning expert that you are not.
But then, what are the essential components of a production learning machine system according to my degree of maturity?
A machine learning system is a workflow that can be broken down into six steps:
This workflow is agnostic to implementation, ie to the types of learning algorithms and deployment modes (online and offline prediction). It must be built with tools that are scalable, reliable, reproducible, easy to use, interoperable and automated.
Usually the maturity of an organization in machine learning is measured in terms of:
AI companies like Uber (Michaelangelo), Google, Airbnb (Bighead) and Facebook (FBlearner Flow) all have platforms that allow to solve the above mentioned problems: having multiple teams that develop and deploy extremely fast huge models ingesting structured and unstructured data, online and offline,
But we don’t all have the same type of resources as these big players. Having said that, let’s see how we could set up our own ML industrialization system.
It should be thought of as an assembly of state of the art, scalable and interoperable bricks, offering standardized APIs to avoid as much as possible the use of glue code which is costly in the long term because it tends to freeze a system with the peculiarities of a package.
In the first phase of the life cycle of a machine learning system, the important part is to get the training data into the learning system, create a service infrastructure to run the models and implement a measure of the quality of the predictions.
We have found that building and managing data pipelines is usually one of the most expensive components of a complete machine learning solution. A platform needs to provide standard tools for building data pipelines to generate feature and label data sets for training (and re-training) and feature data sets for prediction only.
These tools need to be deeply integrated into the data lake or enterprise data warehouses and enterprise online data server systems. They must be scalable and high-performance, with integrated monitoring of data flow and data quality.
They must also provide strong safeguards and controls to encourage and accustom users to adopt best practices (for example, easily ensuring that the same data generation/preparation process is used both at training and forecasting time).
To address these issues Kili was designed as a one-stop shop for storage, annotation, quality monitoring and consumption of unstructured data. The main features are the following:
The most common way to deploy a trained model is to save it in the binary format of the tool of your choice, wrap it in a microservice (e.g. a Python Flask application) and use it for inference. Managed solutions simplify this deployment process and provide tools to perform alpha versions and A/B testing, this approach is known as “Model as Code”. However, this approach has several drawbacks, as the number of models increases, the number of multiple micro-services and thus the number of failure points, latency, etc. making it really difficult to manage. Managed solutions have been introduced to mitigate this problem. Some of the model service-oriented tools are Seldon, Clipper or Tensorflow Serving.
Another more recent approach is to standardize the format of the model so that it can be used by programming using any programming language and does not need to be wrapped in a microservice. This is particularly useful for data processing workflows where latency and error handling are a problem. Since we call the model directly, we don’t have to worry about control, error management, etc. This approach is known as “model as data”. Currently Tensorflow has become the de facto standard, the new SavedModel format contains a complete TensorFlow program including weights and calculation. It does not require execution of the original model building code, making it useful for sharing.
When a model is formed and evaluated, historical data is always used. To ensure that a model works in the future, it is essential to monitor its predictions to ensure that the data pipelines continue to send accurate data and that the production environment has not changed so that the model is no longer accurate.
This can be addressed by automatically recording and possibly retraining a percentage of the predictions made, and then later attaching these predictions to the observed results (or labels) generated by the data pipeline.
With this information, we can generate continuous, real-time measurements of the model’s accuracy.
Prometheus is a free software for computer monitoring and alert generation. It records real-time metrics in a time-series database based on exposed entry point content using the HTTP protocol.
When it comes to unstructured data, Kili provides the additional components to monitor the prediction of a model directly on the unstructured asset.
In some use cases, the manual process of training, validation and deployment of ML models may be sufficient. This manual approach works if your team only manages a few ML models that are not re-trained or modified frequently. However, in practice, models often break down when deployed in production because they fail to adapt to changes in environmental dynamics or the data describing those dynamics.
To integrate an ML system into a production environment, you need to orchestrate the steps in your ML pipeline. In addition, you need to automate the execution of the pipeline for the continuous training of your models. To test new ideas and features, you need to adopt CI/CD practices in new pipeline implementations:
To quickly deploy new ML pipelines, you need to configure a CI/CD pipeline. This pipeline is responsible for the automatic deployment of new ML pipelines and components when new implementations are available and approved for different environments (development, testing, staging and production).
An ML platform must have end-to-end support to manage model deployment via the user interface or API. Ideally there will be two modes in which a model can be deployed:
For this we can use Kubeflow which has been specifically designed to simplify the design and deployment of ML applications by grouping all the steps of an ML process into a self-contained pipeline that functions as a Docker container at the top of the Kubernetes orchestration layer. By relying heavily on Kubernetes, Kubeflow can provide a higher level of abstraction for ML pipelines, freeing up the resources of human data science for higher value-added activities.
Most machine learning models operate in the complex and ever-changing environment of things in motion in the physical world. To keep our models accurate as this environment changes, our models must evolve with it. Teams need to re-train their models on a regular basis. A complete platform solution for this use case involves easily updatable model types, faster training and assessment architecture and pipelines, automated model validation and deployment, and sophisticated monitoring and alerting systems.
The availability of new data is one of the triggers for ML model re-training. The availability of a new implementation of the ML pipeline (including the availability of a new model architecture, feature extraction, and new hyperparameters) is another important trigger for the re-execution of the ML pipeline.
To connect the different system components and implement your CI/CD and CT, you need an orchestrator. The orchestrator executes the pipeline as a sequence and automatically switches from one step to the next according to the defined conditions. For example, a defined condition can be the execution of the model diffusion step after the model evaluation step as soon as the evaluation metrics reach the predefined thresholds. The orchestration of the ML pipeline is useful in the development and production phases :
Examples of orchestrators are Apache Airflow, Kubeflow Pipelines et Apache Beam. These orchestration tools facilitate the configuration, operation, monitoring and maintenance of an ML pipeline.
As teams grow, teams end up defining features differently and it is not easy to access feature documentation. There is also the need to be able to train models that are increasingly complex to train and analyze.
It becomes important to add new bricks to our Machine Learning system: feature store, model visualization and distributed learning.
A feature store that allows teams to share, discover and use a set of very elaborate functionalities for their machine learning problems. On many projects we have found that many modeling problems use the same or similar features, and that it is very useful to allow teams to share features between their own projects and teams from different organizations to share features with each other team.
This allows for improvement:
Examples of feature stores are Feast and HopsWorks.
Understanding and debugging models is becoming increasingly important, especially for deep learning. As machine learning has evolved towards widespread adoption, the challenge is to build models that users can understand. This can be easily observed in high-risk areas, on applications such as healthcare, finance and the judiciary. Interpretability is also important in general applied machine learning issues such as model debugging, regulatory compliance and human-machine interaction.
Although we have taken some important first steps with tree-based model visualization tools, much more needs to be done to enable scientists to understand, debug and tune their models and to give users confidence in the results.
InterpretML is an open-source Python package that exposes machine learning interpretability algorithms.
A growing number of machine learning systems are implementing deep learning technologies. Cases using deep learning typically process a larger amount of data, and require different hardware (i.e. GPUs). This motivates additional investments for closer integration with a cluster.
Parallel distribution of deep neural networks (DNN) training data using multiple GPUs on multiple machines is often the right answer to this problem.
The main challenge of distributed DNN training is that the gradients computed during back-propagation on multiple GPUs must be fully reduced (averaged) in a synchronized step before the gradients are applied to update the model weights on multiple GPUs across multiple nodes. The synchronized total reduction algorithm must be highly efficient, otherwise any training acceleration achieved by parallel distributed training with the data would be lost due to the inefficiency of the synchronized total reduction step.
Uber’s open-source Horovod library has been developed to meet these challenges:
To summarize, here are the bricks that we believe are essential to build a state of the art ML platform. In order:
These are just a few of the things you need to worry about when building a production ML system. Others include recording and monitoring the status of the various services (logging). There are many tools such as Isito, which can be used to secure and monitor your system.
Lorem Ipsum is a simple dummy text used as a dummy text contents. Lorem ipsum will be replaced. Lorem Ipsum is a simple dummy text used as a dummy text contents. Lorem ipsum will be replaced.Lorem Ipsum is a simple dummy text used as a dummy text contents. Lorem ipsum will be replaced.