Programs

Support Vector Machines: Types of SVM [Algorithm Explained]

Introduction

Just like other algorithms in machine learning that perform the task of classification(decision trees, random forest, K-NN) and regression, Support Vector Machine or SVM one such algorithm in the entire pool. It is a supervised (requires labeled data sets) machine learning algorithm that is used for problems related to either classification or regression.

Top Machine Learning Courses & AI Courses Online

However, it is frequently applied in classification problems. SVM algorithm entails plotting of each data item as a point. The plotting is done in an n-dimensional space where n is the number of features of a particular data. Then, classification is carried out by finding the most suitable hyperplane that separates the two(or more) classes effectively.

The term support vectors are just coordinates of an individual feature. Why generalize data points as vectors you may ask. In real-world problems, there exist data -sets of higher dimensions. In higher dimensions(n-dimension), it makes more sense to perform vector arithmetic and matrix manipulations rather than regarding them as points.

Trending Machine Learning Skills

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Types of SVM

Linear SVM : Linear SVM is used for data that are linearly separable i.e. for a dataset that can be categorized into two categories by utilizing a single straight line. Such data points are termed as linearly separable data, and the classifier is used described as a Linear SVM classifier.

Non-linear SVM: Non-Linear SVM is used for data that are non-linearly separable data i.e. a straight line cannot be used to classify the dataset. For this, we use something known as a kernel trick that sets data points in a higher dimension where they can be separated using planes or other mathematical functions. Such data points are termed as non-linear data, and the classifier used is termed as a Non-linear SVM classifier.

Algorithm for Linear SVM

Let’s talk about a binary classification problem. The task is to efficiently classify a test point in either of the classes as accurately as possible. Following are the steps involved in the SVM process. 

Firstly, set of points belonging to the two classes are plotted and visualized as shown below. In a 2-d space by just applying a straight line, we can efficiently divide these two classes. But there can be many lines that can classify these classes. There are a set of lines or hyperplanes(green lines) to choose from. The obvious question will be, out of all these lines which line is suitable for classification?

set of hyper-planes, Image credit 

Basically, Select the hyper-plane which separates the two classes better. We do this by maximizing the distance between the closest data point and the hyper-plane. The greater the distance, the better is the hyperplane and better classification results ensue. It can be seen in the figure below that the hyperplane selected has the maximum distance from the nearest point from each of those classes.

A reminder, the two dotted lines that go parallel to the hyperplane crossing the nearest points of each of the classes are referred to as the support vectors of the hyperplane. Now, the distance of separation between the supporting vectors and the hyperplane is called a margin. And the purpose of the SVM algorithm is to maximize this margin. The optimal hyperplane is the hyperplane with maximum margin.

Image credit

Take for example classifying cells as good and bad. the cell xᵢ is defined as an n-dimensional feature vector that can be plotted on n-dimensional space. Each of these feature vectors are labeled with a class yᵢ. The class yᵢ can either be a +ve or -ve (eg. good=1, not good =-1). The equation of the hyperplane is y=w.x + b = 0. Where W and b are line parameters. The earlier equation returns a value ≥ 1 for examples for +ve class and ≤-1 for -ve class examples.

FYI: Free Deep Learning Course!

But, How does it find this hyperplane? The hyperplane is defined by finding the optimal values w or weights and b or intercept which. And these optimal values are found by minimizing the cost function. Once the algorithm collects these optimal values, the SVM model or the line function f(x) efficiently classifies the two classes.

In a nutshell, the optimal hyperplane has equation w.x+b = 0. The left support vector has equation w.x+b=-1 and the right support vector has w.x+b=1. 

Thus the distance d between two parallel liens Ay = Bx + c1 and Ay = Bx + c2 is given by d = |C1–C2|/√A^2 + B^2. With this formula in place, we have the distance between the two support vectors as 2/||w||.

The cost function for SVM looks the like the equation below:

Image credit 

SVM loss function

In the cost function equation above, the λ parameter denotes that a larger λ provides a broader margin, and a smaller λ would yield a smaller margin. Furthermore, the gradient of the cost function is calculated and the weights are updated in the direction that lowers the lost function. 

Read: Linear Algebra for Machine Learning: Critical Concepts, Why Learn Before ML

Algorithm for Non-linear SVM

In the SVM classifier, it is straight forward to have a linear hyper-plane between these two classes. But, an interesting question which arises is, what if the data is not linearly separable, what should be done? For this, the SVM algorithm has a method called the kernel trick.

The SVM kernel function takes in low dimensional input space and converts it to a higher-dimensional space. In simple words, it converts the not separable problem to a separable problem. It performs complex data transformations based on the labels or outputs that define them

Look at the diagram below to better understand data transformation. The set of data points on the left are clearly not linearly separable. But when we apply a function Φ to the set of data points, we get transformed data points in a higher dimension that is separable via a plane.

Image credit

To separate non linearly separable data points, we have to add an extra dimension. For linear data, two dimensions have been used, that is, x and y. For these data points, we add a third dimension, say z. For the example below let z=x² +y². 

Image credit

This z function or the added dimensionality transforms the the sample space and the above image will become as the following:

Image credit 

On close analysis, it is evident that the above data points can be separated using a straight line function that is either parallel to the x axis or is inclined at an angle. Different types of kernel functions are present — linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid.

What RBF does in simple words is — if we pick some point, the result of an RBF will be the norm of the distance between that point and some fixed point. In other words, we can design a z dimension with the yields of this RBF, which typically gives ‘height’ depending on how far the point is from some point.

The resemblance between two points in the converted feature space shows an exponentially decaying function. This function relates the original input space and the distance between the vectors. RBF is the default kernel in SVM.

Another prevalent kernel is the polynomial kernel which takes an extra parameter called “degree”. This parameter controls the transformation’s computational cost and the model’s complexity. SVM need not perform this transformation on the data points to convert it into a new high-dimensional feature space. It is called the kernel trick, which specifies that the kernelized SVM algorithm can calculate such complex transformations. These transformations are calculated is in terms of similarity calculations among the pairs of points in the higher dimensional feature space wherein the updated feature representation is inherent.

Check out: 6 Types of Activation Function in Neural Networks You Need to Know

Which Kernel to choose?

A nice method to determine which kernel is the most suitable is to make various models with varying kernels, then estimate each of their performance, and ultimately compare the outcomes. Then you pick the kernel with the best results. Be particular to estimate the model’s performance on unlike observations by using K-Fold Cross-Validation and consider different metrics like Accuracy, F1 Score, etc.

SVM in Python and R

The fit method in python simply trains the SVM model on Xtrain and ytrain data that has been separated. More specifically, the fit method will assemble the data in Xtrain and ytrain, and from that, it will calculate the two support vectors.

Once these support vectors are estimated, the classifier model is completely set to produce new predictions with the predict function because it only needs the support vectors to separate the new data. Now you may get different results in Python and in R, so be sure to check the value of the seed parameter. 

Working of a Support Vector Machine

You can better understand its working with an example. Suppose we have black and red labels with the features demonstrated by x and y.  We want to have a classifier for these tags that categorizes data into either the black or red category. It is essential to plot the labeled data on the x-y plane.

A classic SVM divides these data points into black and red tags using the hyperplane. The hyperplane is a two-dimensional line. It shows the decision boundary line where data points belong to the black or red category. Alternatively, a hyperplane in the SVM algorithm is a line that widens the margins between the two closest labels or tags (black and red ad). The data classification is easier because the distance of the hyperplane to the immediate label is the biggest. This scenario is useful for linearly separable data. But, for non-linear data, a straight line can’t separate the individual data points.

Let’s understand the working of SVM with an example of the non-linear complex dataset data. The two dimensions, x and y are enough for linear data. But you can add a “z” dimension to better classify the data points. It is essential to use the third dimension when a single hyperplane is insufficient to separate the involved tags or labels. We can use an equation for a circle i.e. z = x² + y² to understand SVM in machine learning. Due to the three dimensions, the hyperplane runs parallel to the x-direction at a specific value of z (suppose z=1). The rest of the data points are mapped back to two dimensions.

You can better understand it when this case is plotted in a 3D space. The figure will show the boundary for data points across the x and y axes; the z axes are along a circle of the circumference with radii of 1 unit. It separates two labels of tags through the SVM.

You can understand support vector machine is used for which type of problems when you understand the SVM working with an example like above.

Applications of Support Vector Machines

The SVM algorithm depends on supervised learning methods to categorize unknown data into known categories. These algorithms are used in different fields and some of them are discussed below.

  1. Solving the geo-sounding problem:

One of the prevalent use cases for SVMs is the geo-sounding problem. It tracks the planet’s layered structure. This process involves solving the inversion problems wherein the issues’ results or observations are used to categorize the parameters or variables that generated them. The SVM algorithmic models and linear function separate the electromagnetic data. Furthermore, linear programming practices are implemented when developing the supervised models.

2. Data classification:

SVM algorithms can solve complex mathematical problems. But, smooth SVMs are favored for data classification purposes. The smooth SVMs implement smoothing techniques that decrease the data outliers and use the pattern identifiable. The smooth SVMs use algorithms like the Newton-Armijo algorithm to deal with bigger datasets than traditional SVMs can’t. They are used to solve optimization problems. Usually, they use math properties like strong convexity for more direct data classification.

3. Protein remote homology detection:

Protein remote homology is a branch of computational biology that categorizes proteins into functional and structural parameters. This classification is based on the sequence of amino acids when sequence recognition is difficult. SVMs use kernel functions to identify the similarities between protein sequences. Hence, SVMs play a key role in computational biology and removes the confusion on support vector machine is used for which type of problems.

4. Facial detection & expression classification:

SVMs classify facial and non-facial structures. It uses the training data that uses two classes i.e. face entity (represented by +1) and non-face entity (represented as -1). It also uses n*n pixels to differentiate between these two structures. 

Every pixel is analyzed and their features are extracted. These features represent the face and non-face characters. Lastly, the process makes a square decision boundary surrounding the facial structures (according to pixel intensity) and categorizes the resultant images. You can consider this application of SVM if you are confused about -the support vector machine is used for types of problems.

 5. Text categorization & handwriting recognition:

Text categorization classifies data into predefined categories. For instance, news articles contain categories like business, politics, sports, stock market, etc. Another example is classifying emails into junk, spam, non-spam, and others.

SVM assigns every document or article a score. This score is then compared to a threshold value. Subsequently, SVM in machine learning classifies the article into its corresponding category based on the evaluated score.

 6. Surface texture classification:

SVMs can classify images of surfaces. It is assumed that images captured of the surfaces can be inputted into SVMs. This task helps in determining the surfaces’ texture in those images and categorizing them as gritty or smooth surfaces.

 7. Speech recognition:

The support vector machine is used for types of problems and one of them is speech recognition. SVM separates words from speeches in speech recognition use cases. Certain characteristics and features are extracted for each word. The common feature extraction techniques are Linear Prediction Coefficients (LPC), Mel Frequency Cepstral Coefficients (MFCC), and Linear Prediction Cepstral Coefficients (LPCC). These techniques amass audio data, feed it into SVMs and finally train the models for recognizing the speech.

Popular Machine Learning and Artificial Intelligence Blogs

Conclusion

In this article, we looked at the Support Vector Machine algorithm in detail. Thanks for your time. Tune in for more such articles.

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

What kinds of problems are Support Vector Machine models good for?

Support Vector Machines (SVM) work best on linearly separable data, i.e. data that can be separated into two distinct classes using a straight line or hyperplane. One of the most common uses of SVM is in face recognition. The eigenfaces technique is an example of SVM, which does dimensionality reduction of facial images and is used for face recognition. This technique is based on the premise that faces can be thought of as vectors in a high dimensional vector space and the dimensionality is reduced by fitting a hypersphere to the data. This allows us to match two faces which are of a different size, or are rotated. SVM is also used in classification.

What are the applications of SVMs in real-life?

The potential use of SVMs in machine learning is huge. Support vector machines are used in a number of applications such as computer vision, bioinformatics, text mining and a lot more. Their power lies in their ability to solve the non-linear classification problem. Support Vector Machine models are good at binary classification problems, that is, problems where you have a class of input data, and you want to assign the given input data to one of the predefined classes. For example, let's say your input data consisted of a set of images, and you wanted to classify them as either cat or not cat. The Support Vector Machine model would be a good fit for this problem.

Can SVM be used for continuous data?

SVM is used to create a classification model. So, if you have a classifier, it has to work with only two classes. If you have continuous data, then you will have to turn that data into classes, the process is called dimensionality reduction. For example, if you have something like age, height, weight, grade etc. then you can take the mean of that data and make it closer to either one class or another, which then will make the classification easier.

Want to share this article?

Lead the AI Driven Technological Revolution

ADVANCED CERTIFICATION IN MACHINE LEARNING AND CLOUD FROM IIT MADRAS & UPGRAD
Learn More

Leave a comment

Your email address will not be published. Required fields are marked *

Our Popular Machine Learning Course

Get Free Consultation

Leave a comment

Your email address will not be published. Required fields are marked *

×
Get Free career counselling from upGrad experts!
Book a session with an industry professional today!
No Thanks
Let's do it
Get Free career counselling from upGrad experts!
Book a Session with an industry professional today!
Let's do it
No Thanks