Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconEvaluation Metrics in Machine Learning: Top 10 Metrics You Should Know

Evaluation Metrics in Machine Learning: Top 10 Metrics You Should Know

Last updated:
11th Jun, 2023
Views
Read Time
9 Mins
share image icon
In this article
Chevron in toc
View All
Evaluation Metrics in Machine Learning: Top 10 Metrics You Should Know

Deciding the right metric is a crucial step in any Machine Learning project. Every Machine Learning model needs to be evaluated against some metrics to check how well it has learnt the data and performed on test data. These are called the Performance Metrics and are different for regression and classification models.

By the end of this tutorial, you will know:

  • Metrics for regression
  • Metrics for different types of classification
  • When to prefer which type of metric

Metrics for Regression

Regression problems involve predicting a target with continuous values from a set of independent features. This is a type of Supervised learning where we compare the prediction with the actual value and then calculate the difference/error term. Lesser the error, better is the performance of the model. We have different types of Regression metrics that are most widely used currently. Let’s go over them one by one.

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Ads of upGrad blog

1. Mean Squared Error

Mean Squared Error(MSE) is the most used regression metric. It uses squared errors (Y_Pred – Y_actual) to calculate errors. The squaring results in two important changes to the usual error calculation. One, that the error can be negative and squaring the errors will turn all the errors into positive terms and hence can be easily added.

Second, that the squaring increases the errors which are already large and reduces the errors with values less than 1. This magnifying effect penalises the instances where the error is large. MSE is highly preferred because it is differentiable at all the points to calculate the gradient of the loss function. 

2. Root Mean Squared Error

The shortcoming of MSE is that it squares the error terms which lead to overestimation of the errors. Root Mean Squared Error (RMSE), on the other hand, takes a square root to reduce that effect. This is useful when large errors are not desired. 

3. Mean Absolute Error

Mean Absolute Error (MAE) calculates the error by taking an absolute value of the error which is Y_Pred – Y_Actual. This is useful as it is not overestimating the larger errors unlike MSE and is also robust to outliers. Therefore, it is not suitable for applications which require special treatment for outliers. MAE is a linear score which means all the individual differences are weighted equally. 

4. R Squared Error

R Squared is a goodness fit measure for regression models. It calculates the scatter of data points along the regression fit line. It is also called the Coefficient of Determination. Higher R Squared value means that there is less difference between the observed value and the actual values.

R Squared value keeps on increasing as more and more features are added into the model. This means that R Squared is not the right measurement of performance as it might give a large R Square even if the features are not adding any value. 

In Regression Analysis, R Squared is used to determine the strength of correlation between the features and the target. In simple terms, it measures the strength of the relationship between your model and the dependent variable on a 0 – 100% scale. R Squared is the ratio between the Residual Sum of Squares(SSR) and the Total Sum of Squares(SST). R sqr is defined as:

R Sqr = 1 – SSR/SST ,where

SSR is the sum of the squares of the difference between the actual observed value Y and the predicted value Y_Pred. SST is the sum of the squares of the difference between the actual observed value Y and the average of the observed value Y_Avg.

Generally, more the R sqr, better is the model. But is it so always? No.

5. Adjusted R Squared Error

Adjusted R Squared Error overcomes the shortcoming of R Squared of not able to correctly estimate the improvement in model performance when more features are added. R Square value shows an incomplete picture and can be very misleading.

In essence, the R sqr value always increases on adding new features, even if the feature is decreasing the model’s performance. You might not know when your model started to overfit.

Adjusted R Sqr adjusts for this increase of variables and its value decreases when a feature doesn’t improve the model. We use adjusted R sqr to compare the goodness-of-fit for regression models that contain different numbers of independent variables.

Read: Cross-Validation in Machin Learning

Metrics for Classification

Just like regression metrics, there are different types of metrics for classification as well. Different types of metrics are used for different types of classification and data. Let’s go over them one by one.

1. Accuracy

Accuracy is the most straightforward and simple metric for classification. It just calculates what percentage of predictions are correct from the total number of instances. For example, if 90 out of 100 instances are predicted correctly, then the accuracy will be 90%. Accuracy, however, is not the correct metric for most classification tasks as it doesn’t take into account the class imbalance. 

2. Precision, Recall

For a better picture of model performance, we need to see how many false positives were predicted and how many false negatives were predicted by the model. Precision tells us how many of the total positives were predicted as positives. Or in other words, the proportion of positive instances that were correctly predicted as positives out of total positive predictions. Recall tells us how many true positives were predicted out of total actual positives. Or in other words, it gives the proportion of predicted true positives from the total number of actual positives. 

3. Confusion Matrix

A Confusion Matrix is a combination of True Positives, True Negatives, False Positives and False Negatives. It tells us how many were predicted out of the actual true positives and negatives. It is an NxN matrix where N is the number of classes. Confusion Matrix is not so confusing after all!

4. F1 Score

F1 Score combines the Precision and Recall into one metric for an averaged out value. F1 Score is actually the harmonic mean of Precision and Recall values. This is crucial because if in some case the recall value is 1, i.e. 100% and the precision value is 0, the F1 score will be 0.5 if we take the arithmetic mean of Precision & Recall instead of Harmonic mean. But if we take the Harmonic mean, F1 Score will be 0. This tells us that Harmonic mean penalizes extreme values more.

Check out: 5 Types of Classification Algorithms in Machine Learning

5. AUC-ROC

Accuracy and F1 score are nor good metrics when it comes to imbalanced data. AUC (Area Under Curve) ROC (Receiver Operator Characteristics) curve tells us the degree of separability of classes predicted by the model. Higher the score, more is the ability of the model to predict 0s as 0s and 1s as 1s. The AUC ROC Curve is plotted using the True Positive Rate (TPR) on the Y-axis and False Positive rate on the X-axis. 

TPR = TP/TP+FN

FPR = FP/TN+FP

If AUC ROC comes out to be 1, it means that the model is correctly predicting all the classes and there is complete separability.

If it is 0.5, it means that there is no separability and the model is predicting all random outputs.

If it is 0, it means that the model is predicting the inverted classes. That is, 0s as 1s and 1s as 0s.

What Are Evaluation Metrics?

Evaluation metrics are numerical measurements that are used to rate the effectiveness of AI models. They enable us to gauge a model’s effectiveness by contrasting its predictions with the actual results. These measurements shed light on the model’s advantages, disadvantages, and general performance.

Predictive Model Types

AI predictive models are intended to classify or forecast based on input data. They can be broadly divided into two categories: regression models and categorization models. Regression models are utilised for continuous output variables while classification models are employed when the output is categorical.

Gain and Lift Charts

In marketing and customer relationship management (CRM) software, gain and lift charts are evaluation tools that are frequently utilised. These graphs demonstrate the improvement over random selection, which aids in evaluating the efficacy of prediction models. They shed light on the model’s capacity to recognise advantageous occurrences.

Kolomogorov Smirnov Chart

The effectiveness of binary classification models is assessed using the Kolmogorov-Smirnov (KS) chart. The biggest discrepancy between the cumulative distributions of positive and negative examples is what is measured. A more significant KS value denotes a more effective model.

Log Loss

Log loss, commonly, is a typical evaluation statistic employed in classification issues. The discrepancy between expected probabilities and actual results is measured. A more accurate model is one with a lower Log Loss value.

Gini Coefficient

Another evaluation statistic applied to categorization issues is the Gini Coefficient. It assesses the disparity between the likelihood of good and bad events. A model with a lower prediction bias will perform better when the Gini Coefficient is higher.

Concordant – Discordant Ratio

In ranking and survival analysis tasks, the Concordant – Discordant Ratio (CDR) is employed. It gauges the degree of congruence between expected and actual rankings. A higher CDR value indicates a better model’s capacity to organise instances correctly.

Cross Validation

By dividing the data into training and testing sets, the technique of cross-validation can be utilised to evaluate the effectiveness of predictive models. Estimating a model’s ability to generalise to new data and reduce overfitting is helpful.

Performance Metrics in Machine Learning

In addition to the ones described above, several other performance measures are frequently employed in machine learning. These include the R2-Score for regression models, logarithmic loss, and classification accuracy. Each statistic offers a different viewpoint on the model’s performance and can be applied to assess particular criteria.

Logarithmic Loss

By penalising inaccurate predictions, a classification model’s performance is measured using logarithmic loss, often known as log loss. Instead of the predicted labels, it takes into account the predicted probabilities.

R2-Score

Regression model evaluation metrics include the R2-Score, or coefficient of determination. It calculates the percentage of the dependent variable’s variance that the independent variables can account for. The model fits the data better when the R2-Score is higher.

Ads of upGrad blog

Popular AI and ML Blogs & Free Courses

Before you go

In this article, we discussed the various performance metrics for classification and regression. These are the most used metrics and hence it is crucial to know about them. For classification, there are even more metrics which are specifically made for multi-class classification and multi-label classification such as Kappa Score, Precision at K, Average Precision at K, etc. 

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Selectcaret down icon
Select Area of interestcaret down icon
Select Work Experiencecaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Explore Free Courses

Suggested Blogs

15 Interesting MATLAB Project Ideas & Topics For Beginners [2024]
82459
Diving into the world of engineering and data science, I’ve discovered the potential of MATLAB as an indispensable tool. It has accelerated my c
Read More

by Pavan Vadapalli

09 Jul 2024

5 Types of Research Design: Elements and Characteristics
47126
The reliability and quality of your research depend upon several factors such as determination of target audience, the survey of a sample population,
Read More

by Pavan Vadapalli

07 Jul 2024

Biological Neural Network: Importance, Components & Comparison
50612
Humans have made several attempts to mimic the biological systems, and one of them is artificial neural networks inspired by the biological neural net
Read More

by Pavan Vadapalli

04 Jul 2024

Production System in Artificial Intelligence and its Characteristics
86790
The AI market has witnessed rapid growth on the international level, and it is predicted to show a CAGR of 37.3% from 2023 to 2030. The production sys
Read More

by Pavan Vadapalli

03 Jul 2024

AI vs Human Intelligence: Difference Between AI & Human Intelligence
112990
In this article, you will learn about AI vs Human Intelligence, Difference Between AI & Human Intelligence. Definition of AI & Human Intelli
Read More

by Pavan Vadapalli

01 Jul 2024

Career Opportunities in Artificial Intelligence: List of Various Job Roles
89552
Artificial Intelligence or AI career opportunities have escalated recently due to its surging demands in industries. The hype that AI will create tons
Read More

by Pavan Vadapalli

26 Jun 2024

Gini Index for Decision Trees: Mechanism, Perfect & Imperfect Split With Examples
70806
As you start learning about supervised learning, it’s important to get acquainted with the concept of decision trees. Decision trees are akin to
Read More

by MK Gurucharan

24 Jun 2024

Random Forest Vs Decision Tree: Difference Between Random Forest and Decision Tree
51730
Recent advancements have paved the growth of multiple algorithms. These new and blazing algorithms have set the data on fire. They help in handling da
Read More

by Pavan Vadapalli

24 Jun 2024

Basic CNN Architecture: Explaining 5 Layers of Convolutional Neural Network
270718
Introduction In the last few years of the IT industry, there has been a huge demand for once particular skill set known as Deep Learning. Deep Learni
Read More

by MK Gurucharan

21 Jun 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon