MLOps Overview of Tools and Processes For Modern Deep Learning
May 19, 2020
TOOLS FOR MODERN DEEP LEARNING
- A high-level overview of the deep learning pipelines and tools
- Discussion on what a training loop is, tools and approaches for implementing
Typical ML pipeline
Collect and store raw data
Setup labeling process
- Done with labeling tools wrapped in a neu.ro job
Write scripts to store data on the storage in the correct format
- Use jupyter notebook wrapped in neu.ro job
Write and debug the code for training loop from scratch or import an existing solution
- Start a neu.ro a job, connect to it via IDE with remote interpreter to work on the code
Train the model
- Training can also be done with additional options like hyper parameters search and distributed training via neu.ro
Serve the demo
- Deploy the model as a neu.ro job with a simple Web UI
The following steps are very dependent on the project and may include model hosting and monitoring, triggers for retraining a model, data versioning etc.
You can use crowdsourcing platforms for labeling, some of them have special tools for labeling. For example:
Another option is to start the labeling tool as a neu.ro job and serve independently hiring people directly. It can be useful if you work with secure data or want to optimize costs and quality directly. Neu.ro will be happy to set you up with an instance and get your process going through our remote MLOps service. Example tools:
After labeling is done, data is processed and stored, it is time to start the development process.
The main language for developing deep learning models is python. Other languages usually can be used for deploying pipelines into production.
Initial data analysis can be done with useful python-based jupyter notebook.
The development of large code fragments is conveniently can be done in an IDE (PyCharm, Visual Studio Code and so on). Since the calculations are massive, usually the code is developed in the IDE locally, but it is runs remotely via a remote interpreter (remote debugging).
Training the model (Training Loop)
A key element of model development is the model training. At the core of it is the training loop.
It is a process where the model receives labeled samples, backpropagation algorithm calculates the error and the gradients of the loss function are calculated, then the optimizer changes the model’s weight. This loop runs epoch after epoch, batch after batch, as a result we get the best state of the model in terms of best metric value on validation data.
Training Loop Providers
Since the training loop has several repeating parts (such as feeding model data, calculating the loss function, its gradients and metrics, doing an optimizer step) we can write this loop once in an abstract form and allow the user to insert their own logic where necessary via callbacks mechanic.
A callback is a procedure that starts at a certain point. For example:
- a callback that saves the state of a model is executed at the end of an epoch
- a callback that logs metrics values is executed after each data batch
We name the library that provides such an abstract loop — the loop provider.
Lets look at several examples of Training Loop Providers:
Loop Provider: Catalyst
Loop Provider: Keras
Can there be a “Universal” training loop?
Why don’t we write such a loop once and for all, and not think about it anymore? Unfortunately, different deep learning tasks may have different loop structures. For example:
- There are GAN’s including with two (sometimes more) models with different loops for each of them, additionally, we may want to change discriminators weights less often then generated ones
- There are tasks when we start training process with one optimizer, but then we want to change it
- Model may include several parts (2-stage detectors) or several “heads” (outputs)
- There is unsupervised learning component without ground truth labels
Because of variety of approaches, the loop that is abstract enough to address all of them will also be impossibly cumbersome. It is more practical to have a set of out of the box functions for working in a specific application area of deep learning area (object detection, text processing, classical tables tasks)
We will call such tools a domain loop providers.
Domain loop providers
So, besides the general tools that we mentioned (keras, catalyst, fastai), there are some specialist loop providers:
- mmdetection — for object detection and segmentation
- transformers — for texts processing
- open-reid — for solving person reidentification task
- keras-GAN — for training GANs
Thus, the boundary between such tools can be very arbitrary.