Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconEvaluation Metrics in Machine Learning: Top 10 Metrics You Should Know

Evaluation Metrics in Machine Learning: Top 10 Metrics You Should Know

Last updated:
11th Jun, 2023
Read Time
9 Mins
share image icon
In this article
Chevron in toc
View All
Evaluation Metrics in Machine Learning: Top 10 Metrics You Should Know

Deciding the right metric is a crucial step in any Machine Learning project. Every Machine Learning model needs to be evaluated against some metrics to check how well it has learnt the data and performed on test data. These are called the Performance Metrics and are different for regression and classification models.

By the end of this tutorial, you will know:

  • Metrics for regression
  • Metrics for different types of classification
  • When to prefer which type of metric

Metrics for Regression

Regression problems involve predicting a target with continuous values from a set of independent features. This is a type of Supervised learning where we compare the prediction with the actual value and then calculate the difference/error term. Lesser the error, better is the performance of the model. We have different types of Regression metrics that are most widely used currently. Let’s go over them one by one.

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Ads of upGrad blog

1. Mean Squared Error

Mean Squared Error(MSE) is the most used regression metric. It uses squared errors (Y_Pred – Y_actual) to calculate errors. The squaring results in two important changes to the usual error calculation. One, that the error can be negative and squaring the errors will turn all the errors into positive terms and hence can be easily added.

Second, that the squaring increases the errors which are already large and reduces the errors with values less than 1. This magnifying effect penalises the instances where the error is large. MSE is highly preferred because it is differentiable at all the points to calculate the gradient of the loss function. 

2. Root Mean Squared Error

The shortcoming of MSE is that it squares the error terms which lead to overestimation of the errors. Root Mean Squared Error (RMSE), on the other hand, takes a square root to reduce that effect. This is useful when large errors are not desired. 

3. Mean Absolute Error

Mean Absolute Error (MAE) calculates the error by taking an absolute value of the error which is Y_Pred – Y_Actual. This is useful as it is not overestimating the larger errors unlike MSE and is also robust to outliers. Therefore, it is not suitable for applications which require special treatment for outliers. MAE is a linear score which means all the individual differences are weighted equally. 

4. R Squared Error

R Squared is a goodness fit measure for regression models. It calculates the scatter of data points along the regression fit line. It is also called the Coefficient of Determination. Higher R Squared value means that there is less difference between the observed value and the actual values.

R Squared value keeps on increasing as more and more features are added into the model. This means that R Squared is not the right measurement of performance as it might give a large R Square even if the features are not adding any value. 

In Regression Analysis, R Squared is used to determine the strength of correlation between the features and the target. In simple terms, it measures the strength of the relationship between your model and the dependent variable on a 0 – 100% scale. R Squared is the ratio between the Residual Sum of Squares(SSR) and the Total Sum of Squares(SST). R sqr is defined as:

R Sqr = 1 – SSR/SST ,where

SSR is the sum of the squares of the difference between the actual observed value Y and the predicted value Y_Pred. SST is the sum of the squares of the difference between the actual observed value Y and the average of the observed value Y_Avg.

Generally, more the R sqr, better is the model. But is it so always? No.

5. Adjusted R Squared Error

Adjusted R Squared Error overcomes the shortcoming of R Squared of not able to correctly estimate the improvement in model performance when more features are added. R Square value shows an incomplete picture and can be very misleading.

In essence, the R sqr value always increases on adding new features, even if the feature is decreasing the model’s performance. You might not know when your model started to overfit.

Adjusted R Sqr adjusts for this increase of variables and its value decreases when a feature doesn’t improve the model. We use adjusted R sqr to compare the goodness-of-fit for regression models that contain different numbers of independent variables.

Read: Cross-Validation in Machin Learning

Metrics for Classification

Just like regression metrics, there are different types of metrics for classification as well. Different types of metrics are used for different types of classification and data. Let’s go over them one by one.

1. Accuracy

Accuracy is the most straightforward and simple metric for classification. It just calculates what percentage of predictions are correct from the total number of instances. For example, if 90 out of 100 instances are predicted correctly, then the accuracy will be 90%. Accuracy, however, is not the correct metric for most classification tasks as it doesn’t take into account the class imbalance. 

2. Precision, Recall

For a better picture of model performance, we need to see how many false positives were predicted and how many false negatives were predicted by the model. Precision tells us how many of the total positives were predicted as positives. Or in other words, the proportion of positive instances that were correctly predicted as positives out of total positive predictions. Recall tells us how many true positives were predicted out of total actual positives. Or in other words, it gives the proportion of predicted true positives from the total number of actual positives. 

3. Confusion Matrix

A Confusion Matrix is a combination of True Positives, True Negatives, False Positives and False Negatives. It tells us how many were predicted out of the actual true positives and negatives. It is an NxN matrix where N is the number of classes. Confusion Matrix is not so confusing after all!

4. F1 Score

F1 Score combines the Precision and Recall into one metric for an averaged out value. F1 Score is actually the harmonic mean of Precision and Recall values. This is crucial because if in some case the recall value is 1, i.e. 100% and the precision value is 0, the F1 score will be 0.5 if we take the arithmetic mean of Precision & Recall instead of Harmonic mean. But if we take the Harmonic mean, F1 Score will be 0. This tells us that Harmonic mean penalizes extreme values more.

Check out: 5 Types of Classification Algorithms in Machine Learning


Accuracy and F1 score are nor good metrics when it comes to imbalanced data. AUC (Area Under Curve) ROC (Receiver Operator Characteristics) curve tells us the degree of separability of classes predicted by the model. Higher the score, more is the ability of the model to predict 0s as 0s and 1s as 1s. The AUC ROC Curve is plotted using the True Positive Rate (TPR) on the Y-axis and False Positive rate on the X-axis. 



If AUC ROC comes out to be 1, it means that the model is correctly predicting all the classes and there is complete separability.

If it is 0.5, it means that there is no separability and the model is predicting all random outputs.

If it is 0, it means that the model is predicting the inverted classes. That is, 0s as 1s and 1s as 0s.

What Are Evaluation Metrics?

Evaluation metrics are numerical measurements that are used to rate the effectiveness of AI models. They enable us to gauge a model’s effectiveness by contrasting its predictions with the actual results. These measurements shed light on the model’s advantages, disadvantages, and general performance.

Predictive Model Types

AI predictive models are intended to classify or forecast based on input data. They can be broadly divided into two categories: regression models and categorization models. Regression models are utilised for continuous output variables while classification models are employed when the output is categorical.

Gain and Lift Charts

In marketing and customer relationship management (CRM) software, gain and lift charts are evaluation tools that are frequently utilised. These graphs demonstrate the improvement over random selection, which aids in evaluating the efficacy of prediction models. They shed light on the model’s capacity to recognise advantageous occurrences.

Kolomogorov Smirnov Chart

The effectiveness of binary classification models is assessed using the Kolmogorov-Smirnov (KS) chart. The biggest discrepancy between the cumulative distributions of positive and negative examples is what is measured. A more significant KS value denotes a more effective model.

Log Loss

Log loss, commonly, is a typical evaluation statistic employed in classification issues. The discrepancy between expected probabilities and actual results is measured. A more accurate model is one with a lower Log Loss value.

Gini Coefficient

Another evaluation statistic applied to categorization issues is the Gini Coefficient. It assesses the disparity between the likelihood of good and bad events. A model with a lower prediction bias will perform better when the Gini Coefficient is higher.

Concordant – Discordant Ratio

In ranking and survival analysis tasks, the Concordant – Discordant Ratio (CDR) is employed. It gauges the degree of congruence between expected and actual rankings. A higher CDR value indicates a better model’s capacity to organise instances correctly.

Cross Validation

By dividing the data into training and testing sets, the technique of cross-validation can be utilised to evaluate the effectiveness of predictive models. Estimating a model’s ability to generalise to new data and reduce overfitting is helpful.

Performance Metrics in Machine Learning

In addition to the ones described above, several other performance measures are frequently employed in machine learning. These include the R2-Score for regression models, logarithmic loss, and classification accuracy. Each statistic offers a different viewpoint on the model’s performance and can be applied to assess particular criteria.

Logarithmic Loss

By penalising inaccurate predictions, a classification model’s performance is measured using logarithmic loss, often known as log loss. Instead of the predicted labels, it takes into account the predicted probabilities.


Regression model evaluation metrics include the R2-Score, or coefficient of determination. It calculates the percentage of the dependent variable’s variance that the independent variables can account for. The model fits the data better when the R2-Score is higher.

Ads of upGrad blog

Popular AI and ML Blogs & Free Courses

Before you go

In this article, we discussed the various performance metrics for classification and regression. These are the most used metrics and hence it is crucial to know about them. For classification, there are even more metrics which are specifically made for multi-class classification and multi-label classification such as Kappa Score, Precision at K, Average Precision at K, etc. 

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.


Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Coursecaret down icon
Selectcaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Explore Free Courses

Suggested Blogs

Artificial Intelligence course fees
Artificial intelligence (AI) was one of the most used words in 2023, which emphasizes how important and widespread this technology has become. If you
Read More

by venkatesh Rajanala

29 Feb 2024

Artificial Intelligence in Banking 2024: Examples & Challenges
Introduction Millennials and their changing preferences have led to a wide-scale disruption of daily processes in many industries and a simultaneous g
Read More

by Pavan Vadapalli

27 Feb 2024

Top 9 Python Libraries for Machine Learning in 2024
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 Feb 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 Feb 2024

Data Preprocessing in Machine Learning: 7 Easy Steps To Follow
Summary: In this article, you will learn about data preprocessing in Machine Learning: 7 easy steps to follow. Acquire the dataset Import all the cr
Read More

by Kechit Goyal

18 Feb 2024

Artificial Intelligence Salary in India [For Beginners & Experienced] in 2024
Artificial Intelligence (AI) has been one of the hottest buzzwords in the tech sphere for quite some time now. As Data Science is advancing, both AI a
Read More

by upGrad

18 Feb 2024

24 Exciting IoT Project Ideas & Topics For Beginners 2024 [Latest]
Summary: In this article, you will learn the 24 Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Smart Agr
Read More

by Kechit Goyal

18 Feb 2024

Natural Language Processing (NLP) Projects & Topics For Beginners [2023]
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

17 Feb 2024

45+ Interesting Machine Learning Project Ideas For Beginners [2024]
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

16 Feb 2024

Schedule 1:1 free counsellingTalk to Career Expert
footer sticky close icon