MLOps Overview of Tools and Processes For Modern Deep Learning

May 19, 2020


by Aleksei Shabanov


  • A high-level overview of the deep learning pipelines and tools
  • Discussion on what a training loop is, tools and approaches for implementing

Typical ML pipeline

  • Collect and store raw data

  • Setup labeling process

    - Done with labeling tools wrapped in a job

  • Write scripts to store data on the storage in the correct format

  • Analyze data

    - Use jupyter notebook wrapped in job

  • Write and debug the code for training loop from scratch or import an existing solution

    - Start a a job, connect to it via IDE with remote interpreter to work on the code

  • Train the model

    - Training can also be done with additional options like hyper parameters search and distributed training via

  • Serve the demo

    - Deploy the model as a job with a simple Web UI

  • Next steps

    The following steps are very dependent on the project and may include model hosting and monitoring, triggers for retraining a model, data versioning etc.

Some Notes on Labeling

You can use crowdsourcing platforms for labeling, some of them have special tools for labeling. For example:

Yandex Toloka Amazon Mechanical Turk

Another option is to start the labeling tool as a job and serve independently hiring people directly. It can be useful if you work with secure data or want to optimize costs and quality directly. will be happy to set you up with an instance and get your process going through our remote MLOps service. Example tools:

Scalabel LabelMe CVAT

Development Tools

After labeling is done, data is processed and stored, it is time to start the development process.

The main language for developing deep learning models is python. Other languages usually can be used for deploying pipelines into production.

Initial data analysis can be done with useful python-based jupyter notebook.

The development of large code fragments is conveniently can be done in an IDE (PyCharm, Visual Studio Code and so on). Since the calculations are massive, usually the code is developed in the IDE locally, but it is runs remotely via a remote interpreter (remote debugging).