Whitepaper 'FinOps and cost management for Kubernetes'
Please consider giving OptScale a Star on GitHub, it is 100% open source. It would increase its visibility to others and expedite product development. Thank you!
Ebook 'From FinOps to proven cloud cost management & optimization strategies'
menu icon
OptScale — FinOps
FinOps overview
Cost optimization:
AWS
MS Azure
Google Cloud
Alibaba Cloud
Kubernetes
menu icon
OptScale — MLOps
ML/AI Profiling
ML/AI Optimization
Big Data Profiling
OPTSCALE PRICING
menu icon
Acura — Cloud migration
Overview
Database replatforming
Migration to:
AWS
MS Azure
Google Cloud
Alibaba Cloud
VMWare
OpenStack
KVM
Public Cloud
Migration from:
On-premise
menu icon
Acura — DR & cloud backup
Overview
Migration to:
AWS
MS Azure
Google Cloud
Alibaba Cloud
VMWare
OpenStack
KVM

How to organize MLOps flow using OptScale

MLOps, or Machine Learning Operations, is a set of practices that aims to streamline and optimize the lifecycle of machine learning models. It integrates elements from machine learning, DevOps, and data engineering to enhance the efficiency and effectiveness of deploying, monitoring, and maintaining ML models in production environments.

MLOps enables developers to streamline the machine learning development process from experimentation to production. This includes automating the machine learning pipeline, from data collection and model training to deployment and monitoring. This automation helps reduce manual errors and improve efficiency.

Our team added this feature to OptScale to foster better collaboration between data scientists, machine learning engineers, and software developers.

There are two essential concepts in OptScale MLOps: tasks and runs.
  • A run is a single iteration of a task. For example, you set the parameters of the training code and launch it for training. A new entry appears in OptScale for this launch.
  • A task allows you to group several Runs into one entity to view them conveniently.
MLOps_conceptual framework

How to view training results in OptScale

A task is the primary unit of organization and access control for runs; all runs belong to a task. Tasks let you visualize, search for, and compare runs and access run metadata for analysis.

Find already created tasks on the Tasks page of the MLOps section of the menu.

OptScale Task page - created tasks

👉🏻 How to create and add ML tasks in the OptScale Tasks section? Learn more here → https://optscale.ai/how-to-add-tasks-in-mlops-in-optscale/

The Tasks page contains a list of profiled tasks, including their key, status, goals, expenses, and output metrics from the most recent run.

OptScale Task page - data, tabs

You can also set filters and take action. Profiling Integration action gives you complete instructions on successfully launching code on your instance to get the result.

Use Manage Metrics to add or edit metrics. Here, you can define a set of metrics against which every module should be evaluated. You can also set a target value and tendency for each captured metric.

Behind the Executors button hides the Executors’ list, showing you compute resources used to run your ML activities.

You can see information about some tasks by clicking on its name.

Task information
cost optimization ML resource management

Free cloud cost optimization & enhanced ML/AI resource management for a lifetime

OptScale task information Flower classification

Summary cards contain information about the last run status, last run duration, lifetime cost (the cost of all runs in the task), and recommendations (recommendations for instances used as executors).

OptScale Summary Cards

Let’s take a closer look at the Tracked Metrics section.

OptScale Tracked Metrics section

Tracked metrics are a graphical representation of all runs and specific numbers for the last run.

The green circle indicates that the metric has reached the target value under the tendency, while the brown one demonstrates that it has not. After that, the last run value/target metric value line values for all task runs follow.

The Last run executor shows on which instance the last run was executed.

OptScale last run executor

If the executor is known to OptScale, you will receive complete information about it and its expenses. If the executor is unknown to the OptScale, then some information about it will be unavailable.

Please note that supported clouds are AWS and Azure only.

The section will look like it is shown below:

Last run executor section in OptScale

What does the Runs tab display

On the Runs tab, find all the information about the task’s runs.

OptScale Runs Tab

The first thing you’ll see on this page is a graph.

You can filter the data using filters by time period, status, and goals (goals met, goals not met).

You can choose which data will be displayed by clicking on the corresponding selectors. Duration, Expenses, and Metrics attached to the task are available for display.

Duration, Expenses, and Metrics - OptScale Runs Tub

The X-axis displays data points for each task run in the format #number_of_run.

The Y-axis displays the actual values ​​of the indicators for the corresponding run. If two or fewer indicators are selected for display, units of measurement will be displayed. Hover your mouse over a trend line to see a tooltip with the data, metric name, actual value, and unit of measure for that data point.

Below the graph, you can see entries with all task runs.

A vertical green line on the right indicates a run for all metrics that have reached the target value according to the trend. This run achieved its goals.

Run achieved its goals

Each run in the table is clickable.

For each run, you can see run status, metrics (only assigned metrics for the task), brief information about the run dataset, run hyperparameter, tags, and duration information.

Table with Runs in OptScale

Summary cards and an overview, charts, and executors information sorted into tabs are shown on this page.

OptScale Summary cards

Summary cards of the run give information about status, duration, and expenses. The expenses summary card contains non-zero-value expenses if the training code was executed on a cloud executor known to OptScale (the corresponding data source is connected to OptScale). The cost will appear on the corresponding card after expenses for this resource are received.

The information is sorted into tabs. The Overview tab Data blocks the last metrics values, dataset info, hyperparameters, tags, git status, command, and logs.

For each metric, the green circle indicates whether it has reached the target value under the tendency. If not, the brown circle is displayed.

The Model Versions and Executors tab contents

The Model Versions tab of the Tasks page shows all versions of models created in task runs.

OptScale Model Versions Tab

All the executors on which the training code was executed are available on the Executors tab.

OptScale executors tab

If some executor is known to OptScale, you will receive complete information about him and his expenses. Supported clouds are AWS and Azure.

For an unknown executor, some information will be unavailable.

OptScale, an open source platform with MLOps and FinOps capabilities, offers complete transparency and optimization of cloud expenses across various organizations and features MLOps tools such as tracking ML experiments, ML Leaderboards, versioning models, hyperparameter tuningTry it out in OptScale demo

Enter your email to be notified about new and relevant content.

Thank you for joining us!

We hope you'll find it usefull

You can unsubscribe from these communications at any time. Privacy Policy

News & Reports

MLOps open source platform

A full description of OptScale as an MLOps open source platform.

Enhance the ML process in your company with OptScale capabilities, including

  • ML/AI Leaderboards
  • Experiment tracking
  • Hyperparameter tuning
  • Dataset and model versioning
  • Cloud cost optimization

How to use OptScale to optimize RI/SP usage for ML/AI teams

Find out how to: 

  • enhance RI/SP utilization by ML/AI teams with OptScale
  • see RI/SP coverage
  • get recommendations for optimal RI/SP usage

Why MLOps matters

Bridging the gap between Machine Learning and Operations, we’ll cover in this article:

  • The driving factors for MLOps
  • The overlapping issues between MLOps and DevOps
  • The unique challenges in MLOps compared to DevOps
  • The integral parts of an MLOps structure