Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconTop 10 Dimensionality Reduction Techniques For Machine Learning

Top 10 Dimensionality Reduction Techniques For Machine Learning

Last updated:
7th Aug, 2020
Views
Read Time
11 Mins
share image icon
In this article
Chevron in toc
View All
Top 10 Dimensionality Reduction Techniques For Machine Learning

Isn’t it crazy how much data we’re bombarded with every second? It’s like the lifeblood of decision-making for businesses and organizations everywhere. But here’s the kicker: drowning in data doesn’t always lead to productivity or accuracy. Sometimes, it just overwhelms us, making it hard to see the forest for the trees. 

That’s where Dimensionality Reduction techniques swoops in like a superhero for us data scientists. It’s our trusty sidekick, helping us wrangle those massive datasets. By trimming down the number of features or variables without losing the juicy bits, Dimensionality Reduction simplifies things. Suddenly, analysis and visualization become a breeze, and we can finally spot those hidden gems buried deep within the data avalanche. Ready to dive into this topic? Let’s explore more in this article. 

Best Machine Learning and AI Courses Online

What is Dimensionality Reduction?

In simple words, dimensionality reduction refers to the technique of reducing the dimension of a data feature set. Usually, machine learning datasets (feature set) contain hundreds of columns (i.e., features) or an array of points, creating a massive sphere in a three-dimensional space. By applying dimensionality reduction, you can decrease or bring down the number of columns to quantifiable counts, thereby transforming the three-dimensional sphere into a two-dimensional object (circle). 

Ads of upGrad blog

Now comes the question, why must you reduce the columns in a dataset when you can directly feed it into an ML algorithm and let it work out everything by itself?

The curse of dimensionality mandates the application of dimensionality reduction.

In-demand Machine Learning Skills

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Benefits of Applying Dimensionality Reduction

There are various benefits of using the available dimensionality reduction techniques in data analysis and machine learning. It allows one to get rid of the curse of dimensionality and solve the problem with high computational complexity involving datasets in many dimensions. This improves the precision of algorithms to be used as well. 

The dimensionality reduction also aids in the determination and preservation of unique features while discarding irrelevant or less informative ones. Thus, this not only simplifies the dataset but also increases model interpretability, making it easier to understand and analyze. 

In addition, lower dimensionality supports the machine learning models’ quick training and inference time. Overfitting is minimized, particularly in cases where an individual has to deal with small datasets because this prevents the models from capturing any noise or even entering noisy correlations within high dimensional spaces. 

As a rule, this technique improves the computational efficiency and understandability of the model (by simplifying it in terms of input feature count) combined with increased accuracy when generalizing. 

The Curse of Dimensionality

The curse of dimensionality is a phenomenon that arises when you work (analyze and visualize) with data in high-dimensional spaces that do not exist in low-dimensional spaces. 

curse-dimensionality

Source

The higher is the number of features or factors (a.k.a. variables) in a feature set, the more difficult it becomes to visualize the training set and work on it. Another vital point to consider is that most of the variables are often correlated. So, if you think every variable within the feature set, you will include many redundant factors in the training set. 

Furthermore, the more variables you have at hand, the higher will be the number of samples to represent all the possible combinations of feature values in the example. When the number of variables increases, the model will become more complex, thereby increasing the likelihood of overfitting. When you train an ML model on a large dataset containing many features, it is bound to be dependent on the training data. This will result in an overfitted model that fails to perform well on real data.

The primary aim of dimensionality reduction is to avoid overfitting. A training data with considerably lesser features will ensure that your model remains simple – it will make smaller assumptions.

Apart from this, dimensionality reduction has many other benefits, such as:

  • It eliminates noise and redundant features.
  • It helps improve the model’s accuracy and performance.
  • It facilitates the usage of algorithms that are unfit for more substantial dimensions.
  • It reduces the amount of storage space required (less data needs lesser storage space).
  • It compresses the data, which reduces the computation time and facilitates faster training of the data.

Read : What is Linear discriminant analysis

Dimensionality Reduction Techniques

Dimensionality reduction techniques can be categorized into two broad categories:

1. Feature selection

The feature selection method aims to find a subset of the input variables (that are most relevant) from the original dataset. Feature selection includes three strategies, namely:

  • Filter strategy
  • Wrapper strategy 
  • Embedded strategy 

2. Feature extraction

Feature extraction, a.k.a, feature projection, converts the data from the high-dimensional space to one with lesser dimensions. This data transformation may either be linear or it may be nonlinear as well. This technique finds a smaller set of new variables, each of which is a combination of input variables (containing the same information as the input variables). 

Without further ado, let’s dive into a detailed discussion of a few commonly used dimensionality reduction techniques!

FYI: Free Deep Learning Course!

1. Principal Component Analysis (PCA)

Principal Component Analysis is one of the leading linear techniques of dimensionality reduction. This method performs a direct mapping of the data to a lesser dimensional space in a way that maximizes the variance of the data in the low-dimensional representation.

Essentially, it is a statistical procedure that orthogonally converts the ‘n’ coordinates of a dataset into a new set of n coordinates, known as the principal components. This conversion results in the creation of the first principal component having the maximum variance. Each succeeding principal component bears the highest possible variance, under the condition that it is orthogonal (not correlated) to the preceding components. 

The PCA conversion is sensitive to the relative scaling of the original variables. Thus, the data column ranges must first be normalized before implementing the PCA method. Another thing to remember is that using the PCA approach will make your dataset lose its interpretability. So, if interpretability is crucial to your analysis, PCA is not the right dimensionality reduction method for your project.

2. Non-negative matrix factorization (NMF)

NMF breaks down a non-negative matrix into the product of two non-negative ones. This is what makes the NMF method a valuable tool in areas that are primarily concerned with non-negative signals (for instance, astronomy). The multiplicative update rule by Lee & Seung improved the NMF technique by – including uncertainties, considering missing data and parallel computation, and sequential construction.

These inclusions contributed to making the NMF approach stable and linear. Unlike PCA, NMF does not eliminate the mean of the matrices, thereby creating unphysical non-negative fluxes. Thus, NMF can preserve more information than the PCA method.

Sequential NMF is characterized by a stable component base during construction and a linear modeling process. This makes it the perfect tool in astronomy. Sequential NMF can preserve the flux in the direct imaging of circumstellar structures in astronomy, such as detecting exoplanets and direct imaging of circumstellar disks. 

3. Linear discriminant analysis (LDA)

The linear discriminant analysis is a generalization of Fisher’s linear discriminant method that is widely applied in statistics, pattern recognition, and machine learning. The LDA technique aims to find a linear combination of features that can characterize or differentiate between two or more classes of objects. LDA represents data in a way that maximizes class separability. While objects belonging to the same class are juxtaposed via projection, objects from different classes are arranged far apart. 

4. Generalized discriminant analysis (GDA)

The generalized discriminant analysis is a nonlinear discriminant analysis that leverages the kernel function operator. Its underlying theory matches very closely to that of support vector machines (SVM), such that the GDA technique helps to map the input vectors into high-dimensional feature space. Just like the LDA approach, GDA also seeks to find a projection for variables in a lower-dimensional space by maximizing the ratio of between-class scatters to within-class scatter.

5. Missing Values Ratio

When you explore a given dataset, you might find that there are some missing values in the dataset. The first step in dealing with missing values is to identify the reason behind them. Accordingly, you can then impute the missing values or drop them altogether by using the befitting methods. This approach is perfect for situations when there are a few missing values. 

However, what to do when there are too many missing values, say, over 50%? In such situations, you can set a threshold value and use the missing values ratio method. The higher the threshold value, the more aggressive will be the dimensionality reduction. If the percentage of missing values in a variable exceeds the threshold, you can drop the variable.

Generally, data columns having numerous missing values hardly contain useful information. So, you can remove all the data columns having missing values higher than the set threshold.

6. Low Variance Filter

Just as you use the missing values ratio method for missing variables, so for constant variables, there’s the low variance filter technique. When a dataset has constant variables, it is not possible to improve the model’s performance. Why? Because it has zero variance. 

In this method also, you can set a threshold value to wean out all the constant variables. So, all the data columns with variance lower than the threshold value will be eliminated. However, one thing you must remember about the low variance filter method is that variance is range dependent. Thus, normalization is a must before implementing this dimensionality reduction technique.

7. High Correlation Filter

If a dataset consists of data columns having a lot of similar patterns/trends, these data columns are highly likely to contain identical information. Also, dimensions that depict a higher correlation can adversely impact the model’s performance. In such an instance, one of those variables is enough to feed the ML model. 

For such situations, it’s best to use the Pearson correlation matrix to identify the variables showing a high correlation. Once they are identified, you can select one of them using VIF (Variance Inflation Factor). You can remove all the variables having a higher value ( VIF > 5 ). In this approach, you have to calculate the correlation coefficient between numerical columns (Pearson’s Product Moment Coefficient) and between nominal columns (Pearson’s chi-square value). Here, all the pairs of columns having a correlation coefficient higher than the set threshold will be reduced to 1. 

Since correlation is scale-sensitive, you must perform column normalization.

8. Backward Feature Elimination

In the backward feature elimination technique, you have to begin with all ‘n’ dimensions. Thus, at a given iteration, you can train a specific classification algorithm is trained on n input features. Now, you have to remove one input feature at a time and train the same model on n-1 input variables n times. Then you remove the input variable whose elimination generates the smallest increase in the error rate, which leaves behind n-1 input features. Further, you repeat the classification using n-2 features, and this continues till no other variable can be removed.

Each iteration (k) creates a model trained on n-k features having an error rate of e(k). Following this, you must select the maximum bearable error rate to define the smallest number of features needed to reach that classification performance with the given ML algorithm.

Also Read: Why Data Analysis is Important in Business

9. Forward Feature Construction

The forward feature construction is the opposite of the backward feature elimination method. In the forward feature construction method, you begin with one feature and continue to progress by adding one feature at a time (this is the variable that results in the greatest boost in performance).

Both forward feature construction and backward feature elimination are time and computation-intensive. These methods are best suited for datasets that already have a low number of input columns.

10. Random Forests

Random forests are not only excellent classifiers but are also extremely useful for feature selection. In this dimensionality reduction approach, you have to carefully construct an extensive network of trees against a target attribute. For instance, you can create a large set (say, 2000) of shallow trees (say, having two levels), where each tree is trained on a minor fraction (3) of the total number of attributes. 

The aim is to use each attribute’s usage statistics to identify the most informative subset of features. If an attribute is found to be the best split, it usually contains an informative feature that is worthy of consideration. When you calculate the score of an attribute’s usage statistics in the random forest in relation to other attributes, it gives you the most predictive attributes.

Popular AI and ML Blogs & Free Courses

Disadvantages of Dimensionality Reduction Techniques  

Although there are several pros of applying dimensionality reduction techniques in machine learning, at the same time its cons should be mentioned.  

  • Loss of data is a great drawback, particularly if the reduction leads to the eradication of less powerful yet useful attributes. This might lead to a decrease in the models’ predictability ability. 
  • Overfitting is yet another limitation inherent to the approach, and this risk arises because the dimensionality reduction technique is used thoughtlessly. Sometimes, it is awesome that transformed features are hard to interpret, and thus, the model itself becomes quite opaque.  
  • Furthermore, the choice of effective dimensionality reduction technique and tuning of some parameters can also be tricky, sometimes requiring a certain amount of domain-specific knowledge. 
Ads of upGrad blog

Despite its benefits, dimensionality reduction is risky to adopt as applying such a technique necessitates cautious weighing up between advantages and disadvantages. 

Conclusion

To conclude, when it comes to dimensionality reduction techniques for machine learning, no technique is the absolute best. Each has its quirks and advantages. Thus, the best way to implement dimensionality reduction techniques is to use systematic and controlled experiments to figure out which technique(s) works with your model and which delivers the best performance on a given dataset.

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Coursecaret down icon
Selectcaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1What is Dimensionality Reduction?

Dimensionality reduction is a technique used in data mining to map high-dimensional data into a low-dimensional representation in order to visualize data and find patterns that are otherwise not apparent using traditional methods. It is often used in conjunction with clustering techniques or classification techniques to project the data into a lower dimensional space to facilitate visualizing the data and finding patterns.

2What are ways of reducing dimensionality?

3 dimensionality reduction techniques are popular and widely used. 1. Principal Component Analysis (PCA): It is a method of reducing the dimensionality of a data set by transforming it into a new coordinate system such that the greatest variance in the data is explained by the first coordinate and the second greatest variance is explained by the second coordinate, and so on. 2. Factor Analysis: It is a statistical technique for extracting independent variables (also called factors) from a data set. The purpose is to simplify or reduce the number of variables in a data set. 3. Correspondence Analysis: It is a versatile method that allows one to simultaneously consider both the categorical and continuous variables in a data set.

3What are the disadvantages of dimensionality reduction?

The main disadvantage of dimensionality reduction is that it does not guarantee the reconstruction of the original data. For example, in PCA, two data points that are very close together in the input space may end up very far away from each other in the output. This makes it difficult to find the input point in the output data. In addition, the data might be more difficult to interpret after dimensionality reduction. For example, in PCA, you can still think of the first component as the first principal component, but it is not easy to assign meaning to the second component or higher. From a practical standpoint, because of this disadvantage, dimensionality reduction is generally followed by doing k-means clustering or another dimensionality reduction technique on the dataset.

Explore Free Courses

Suggested Blogs

Top 9 Python Libraries for Machine Learning in 2024
74358
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 Feb 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
63870
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 Feb 2024

Data Preprocessing in Machine Learning: 7 Easy Steps To Follow
146950
Summary: In this article, you will learn about data preprocessing in Machine Learning: 7 easy steps to follow. Acquire the dataset Import all the cr
Read More

by Kechit Goyal

18 Feb 2024

Artificial Intelligence Salary in India [For Beginners & Experienced] in 2024
906202
Artificial Intelligence (AI) has been one of the hottest buzzwords in the tech sphere for quite some time now. As Data Science is advancing, both AI a
Read More

by upGrad

18 Feb 2024

24 Exciting IoT Project Ideas & Topics For Beginners 2024 [Latest]
742151
Summary: In this article, you will learn the 24 Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Smart Agr
Read More

by Kechit Goyal

18 Feb 2024

Natural Language Processing (NLP) Projects & Topics For Beginners [2023]
105218
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

17 Feb 2024

45+ Interesting Machine Learning Project Ideas For Beginners [2024]
323035
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

16 Feb 2024

AWS Salary in India in 2023 [For Freshers & Experienced]
903491
Summary: In this article, you will learn about AWS Salary in India For Freshers & Experienced. AWS Salary in India INR 6,07,000 per annum AW
Read More

by Pavan Vadapalli

15 Feb 2024

Top 8 Exciting AWS Projects & Ideas For Beginners [2023]
95474
AWS Projects & Topics Looking for AWS project ideas? Then you’ve come to the right place because, in this article, we’ve shared multiple AWS proj
Read More

by Pavan Vadapalli

13 Feb 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon