Evaluation protocol
Apples-to-apples comparison
OptScale Evaluation protocol contains a set of rules by which candidates are compared, filtered, and discarded. It ensures that trained models are tested in a consistent, transparent, and repeatable way.
Users can define a priority metric for ranking candidates on the leaderboard, set conditions on the values of other metrics to filter out unsuitable candidates and select the datasets on which the candidates will be evaluated.
For example, ML specialists may consider only models with an Accuracy above 0.95, a specified Runtime, Loss, Precision, Sensitivity, F1 Score, Cost, and so on.
The evaluation protocol ensures that the evaluation can be repeated with the same results.
For example, a particular model may show excellent accuracy on a specific dataset, and ML specialists can evaluate the model on another dataset.
With OptScale, ML specialists can make a fair comparison between models, ensuring that the differences in performance are due to the models themselves and not external factors.
Users can compare models using the same datasets, data preprocessing steps (such as normalization, scaling, or feature engineering), hyperparameters, evaluation metrics, and training conditions.
OptScale Leaderboards guarantees a consistent evaluation dataset and metrics across all model runs to enforce an apples-to-apples comparison.
A full description of OptScale as an MLOps open source platform.
Enhance the ML process in your company with OptScale capabilities, including
Find out how to:
Powered by