The application of Machine Learning in various fields has increased by leaps and bounds in the past few years, and it is continuing to do so. One of the Machine Learning model’s most popular tasks is to recognise objects and separate them into their designated classes.
This is the method of Classification that is one of the most popular applications of Machine Learning. Classification is used to separate a huge amount of data into a set of discrete values that may be binary such as 0/1, Yes/No, or multi-class such as animals, cars, birds, etc.
In the following article, we shall understand the concept of Classification in Machine Learning, the types of Data involved, and see some of the most popular Classification algorithms used in Machine Learning to classify several data.
What is Supervised Learning?
As we are getting ready to dive into the concept of Classification and its types, let us quickly refresh ourselves with what is meant by Supervised Learning and how it differs from the other method of Unsupervised Learning in Machine Learning.
Let us understand this by taking a simple example from our Physics class in High School. Suppose there is a simple problem involving a new method. If we are presented a question where we have to solve using the same method, wouldn’t we all refer to an example problem with the same method and try solving it. Once we are confident with that method, we need not refer to it again and continue solving it.
This is the same way in which Supervised Learning works in Machine Learning. It learns by example. To keep it even more simple, in Supervised Learning, the entire data is fed with their corresponding labels and hence during the training process, the Machine Learning model looks compares its output for a particular data with the true output of that same data and tries to minimise the error between both the predicted and real label value.
The Classification Algorithms that we will go through in this article follow this method of Supervised Learning—for example, Spam Detection and Object Recognition.
Unsupervised Learning is a step above in which the data is not fed with its labels. It is up to the responsibility and efficiency of the Machine Learning model to derive patterns from the data and give the output. Clustering algorithms follow this Unsupervised method of Learning.
What is Classification?
Classification is defined as recognising, understanding, and grouping the objects or data into pre-set classes. By categorising the data before the Machine Learning model’s training process, we can use various classification algorithms to classify the data into several classes. Unlike Regression, a classification problem is when the output variable is a category, such as “Yes” or “No” or “Disease” or “No Disease”.
In most of the Machine Learning problems, once the dataset is loaded to the program, before training, splitting the dataset into a training set and the test set with a fixed ratio (Usually 70% training set and 30% test set). This splitting process allows the model to perform backpropagation in which it tries to correct its error of the predicted value against the true value by several mathematical approximations.
Similarly, before we begin Classification, the training dataset is created. The Classification algorithm undergoes training on it while testing on the test dataset with each iteration, known as an epoch.
One of the most common Classification Algorithms applications is filtering the emails as to whether they are “spam” or “non-spam.” In short, we can define Classification in Machine Learning as a form of “Pattern Recognition” in which these algorithms that are applied to the training data are used to extract several patterns from the data (Such as similar words or number sequences, sentiments, etc.).
Classification is a process of categorising a given set of data into classes; it can be performed on both structured or unstructured data. It begins by predicting the class of the given data points. These classes are also referred to as output variables, target labels etc. Several algorithms have inbuilt mathematical functions to approximate the mapping function from the input data point variables to the output target class. Classification’s primary goal is to identify which class/category the new data will fall into.
Types of Classification Algorithms in Machine Learning
Depending upon the type of data on which the Classification Algorithms is applied, there are two broad categories of algorithms, the Linear and the Non-linear models.
- Logistic Regression
- Support Vector Machines (SVM)
- K-Nearest Neighbours (KNN) Classification
- Kernel SVM
- Naïve Bayes Classification
- Decision Tree Classification
- Random Forest Classification
In this article, we shall briefly go through the concept behind each of the algorithms that are mentioned above.
Evaluation of a Classification Model in Machine Learning
Before we jump into these algorithms’ concepts mentioned above, we must understand how we can evaluate our Machine Learning model built on top of these algorithms. It is essential to evaluate our model for accuracy on both the training set and the test set.
Cross-Entropy Loss or Log Loss
This is the first type of loss function that we will use in evaluating the performance of a classifier whose output is between 0 and 1. This is mostly used for Binary Classification models. The Log Loss formula is given by,
Log Loss = -((1 – y) * log(1 – yhat) + y * log(yhat))
Where that is the predicted value, and y is the real value.
A confusion matrix is an N X N matrix, where N is the number of classes being predicted. The confusion matrix provides us with a matrix/table as output and describes the model’s performance. It consists of the predictions result in the form of a matrix from which we can derive several performance metrics to evaluate the Classification model. It is of the form,
|Actual Positive||Actual Negative|
|Predicted Positive||True Positive||False Positive|
|Predicted Negative||False Negative||True Negative|
A few of the performance metrics that can be derived from the above table are given below.
1.Accuracy – the proportion of the total number of correct predictions.
2. Positive Predictive Value or Precision – the proportion of positive cases that correctly identified.
3. Negative Predictive Value – the proportion of negative cases that correctly identified.
4. Sensitivity or Recall – the proportion of actual positive cases which are correctly identified.
5. Specificity – the proportion of actual negative cases which are correctly identified.
AUC-ROC Curve –
This is another important curve metric that evaluates any Machine Learning model. ROC curve stands for Receiver Operating Characteristics Curve, and AUC stands for Area Under the Curve. The ROC curve is plotted with TPR and FPR, where TPR (True Positive Rate) on Y-axis and FPR (False Positive Rate) on X-axis. It shows the performance of the classification model at different thresholds.
1. Logistic Regression
Logistic Regression is a machine learning algorithm for Classification. In this algorithm, the probabilities describing a single trial’s possible outcomes are modelled using a logistic function. It assumes the input variables are numeric and have a Gaussian (bell curve) distribution.
The logistic function, also called a sigmoid function, was initially used by statisticians to describe population growth in ecology. The sigmoid function is a mathematical function used to map the predicted values to probabilities. Logistic Regression has an S-shaped curve and can take values between 0 and 1 but never exactly at those limits.
Logistic Regression is primarily used to predict a binary outcome such as Yes/No and a Pass/Fail. The independent variables can be categorical or numeric, but the dependent variable is always categorical. The formula for Logistic Regression is given by,
Where e represents the S-shaped curve which has values between 0 and 1.
2. Support Vector Machines
A support vector machine (SVM) uses algorithms to train and classify data within degrees of polarity, taking it to a degree beyond X/Y prediction. In SVM, the line that is used to separate the classes is referred to as Hyperplane. The data points on either side of the Hyperplane closest to the Hyperplane are called Support Vectors used to plot the boundary line.
This Support Vector Machine in Classification represents the training data as data points in a space in which many categories are separated into the Hyperplane categories. When a new point enters, it is classified by predicting into which category they fall under and belong to a particular space.
The main aim of the Support Vector machine is to maximise the margin between the two Support Vectors.
3. K-Nearest Neighbours (KNN) Classification
KNN Classification is one of the simplest algorithms of Classification, yet it is highly put into use because of its high efficiency and ease to use. In this method, the entire dataset is stored in the machine initially. Then, a value – k is chosen, which represents the number of neighbours. In this way, when a new data point is added to the dataset, it takes the majority vote of the k nearest neighbours’ class label to that new data point. With this vote, the new data point is added to that particular class with the highest vote.
4. Kernel SVM
As mentioned above, the Linear Support Vector Machine can only be applied to only linear data in nature. However, all the data in the world is not linearly separable. Hence, we need to develop a Support Vector Machine to account for data that are also non-linearly separable. Here comes the Kernel trick, also known as the Kernel Support Vector Machine or Kernel SVM.
In Kernel SVM, we select a kernel such as the RBF or the Gaussian Kernel. All the data points are mapped to a higher dimension, where they become linearly separable. In this way, we can create a decision boundary between the different classes of the dataset.
Hence, in this way, using the basic concepts of Support Vector Machines, we can design a Kernel SVM for non-linear.
5. Naïve Bayes Classification
The Naïve Bayes Classification has its roots belonging to the Bayes Theorem, assuming that all the independent variables (features) of the dataset are independent. They have equal importance in predicting the outcome. This assumption of the Bayes Theorem gives the name- ‘Naïve’. It is used for various tasks, such as spam filtering and other areas of text classification. Naive Bayes calculates the possibility of whether a data point belongs within a certain category or does not.
The formula of the Naïve Bayes Classification is given by,
6. Decision Tree Classification
A decision tree is a supervised learning algorithm that is perfect for classification problems, as it can order classes on a precise level. It operates in the form of a flowchart where it separates the data points at each level. The final structure looks like a tree with nodes and leaves.
A decision node will have two or more branches, and a leaf represents a classification or decision. In the above example of a Decision Tree, by asking several questions, a flowchart is created, which helps us to solve the simple problem of predicting whether to go to the market or not.
7. Random Forest Classification
Coming to the last Classification Algorithm of this list, The Random Forest is only an extension of the Decision Tree Algorithm. A Random Forest is an ensemble learning method with multiple Decision Trees. It works in the same manner as that of Decision Trees.
The Random Forest Algorithm is an advancement to the existing Decision Tree Algorithm, which suffers from a major problem of “overfitting“. It is also considered to be faster and more accurate in comparison with the Decision Tree Algorithm.
Also Read: Machine Learning Project Ideas & Topics
Thus, in this article on Machine Learning Methods for Classification, we have understood the basics of Classification and Supervised Learning, Types and Evaluation metrics of Classification models and finally, a summary of all the most commonly used Classification models Machine Learning.
If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.