Boosting in Machine Learning is an important topic. Many analysts get confused about the meaning of this term. That’s why, in this article, we’ll find out what is meant by Machine Learning boosting and how it works. Boosting helps ML models in improving their prediction accuracy. Let’s discuss this algorithm in detail:
What is Boosting in Machine Learning?
Before we discuss ‘Machine Learning boosting,’ we should first consider the definition of this term. Boosting means ‘to encourage or help something to improve.’ Machine learning boosting does precisely the same thing as it empowers the machine learning models and enhances their accuracy. Due to this reason, it’s a popular algorithm in data science.
Boosting in ML refers to the algorithms which convert weak learning models into strong ones. Suppose we have to classify emails in ‘Spam’ and ‘Not Spam’ categories. We can take the following approach to make these distinctions:
- If the email only has a single image file, it’s spam (because the image is usually promotional)
- If the email contains a phrase similar to ‘You have won a lottery,’ it’s spam.
- If the email only contains a bunch of links, it’s spam.
- If the email is from a source that’s present in our contact list, it is not a spam.
Now, even though we have rules for classification, do you think they are strong enough individually to identify whether an email is a spam or not? They are not. On an individual basis, these rules are weak and aren’t sufficient to classify an email in ‘Not Spam’ or ‘Spam.’ We’ll need to make them stronger, and we can do that by using a weighted average or considering the prediction of the higher vote.
So, in this case, we have five classifiers, out of which three classifiers mark the email as ‘Spam,’ therefore, we’ll consider an email ‘Spam’ by default, as this class has a higher vote than ‘Not Spam’ category.
This example was to give you an idea of what boosting algorithms are. They are more complex than this.
Have a look at: 25 Machine Learning Interview Questions & Answers
How do they work?
The above example has shown us that boosting combines weak learners to form strict rules. So, how would you identify these weak rules? To find an uncertain rule, you’ll have to use instance-based learning algorithms. Whenever you apply a base learning algorithm, it would produce a weak prediction rule. You’ll repeat this process for multiple iterations, and with each iteration, the boosting algorithm would combine the weak rules to form a strong rule.
The boosting algorithm chooses the right distribution for every iteration through several steps. First, it’ll take all the various allocations and assign them equal weight. If the first base learning algorithm makes an error, it’ll add more weight to those observations. After assigning weight, we move onto the next step.
In this step, we’ll keep repeating the process until we increase the accuracy of our algorithm. We’ll then combine the output of the weak learners and create a strong one that would empower our model and help it in making better predictions. A boosting algorithm focuses more on the assumptions that cause high errors due to their weak rules.
Learn more: 5 Breakthrough Applications of Machine Learning
Different Kinds of Boosting Algorithms
Boosting algorithms can use many sorts of underlying engines, including margin-maximizers, decision stamps, and others. Primarily, there are three types of Machine Learning boosting algorithms:
- Adaptive Boosting (also known as AdaBoosta)
- Gradient Boosting
We’ll discuss the first two, AdaBoost and Gradient Boosting, briefly in this article. XGBoost is a much more complicated topic, which we’ll discuss in another article.
1. Adaptive Boosting
Suppose you have a box that has five pluses and five minuses. Your task is to classify them and put them in different tables.
In the first iteration, you assign equal weights to every data point and apply a decision stump in the box. However, the line only segregates two pluses from the group, and all others remain together. Your decision stump (which is a line that goes through our supposed box), fails to predict all the data points correctly and has placed three pluses with the minuses.
In the next iteration, we assign more weight to the three pluses we had missed previously; but this time, the decision stump only separates two minutes from the group. We’ll assign more weight to the minuses we missed in this iteration and repeat the process. After one or two repetitions, we can combine a few of these results to produce one strict prediction rule.
AdaBoost works just like this. It first predicts by using the original data and assigns equal weight to every point. Then it attaches higher importance to the observations the first learner fails to predict correctly. It repeats the process until it reaches a limit in the accuracy of the model.
You can use decision stamps as well as other Machine Learning algorithms with Adaboost.
Here’s an example of AdaBoost in Python:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import make_classification
X,Y = make_classification(n_samples=100, n_features=2, n_informative=2,
n_redundant=0, n_repeated=0, random_state=102)
clf = AdaBoostClassifier(n_estimators=4, random_state=0, algorithm=’SAMME’)
2. Gradient Boosting
Gradient Boosting uses the gradient descent method to reduce the loss function of the entire operation. Gradient descent is a first-order optimization algorithm that finds the local minimum of a function (differentiable function). Gradient boosting sequentially trains multiple models, and it can fit novel models to get a better estimate of the response.
It builds new base learners that can correlate with the loss function’s negative gradient and that are connected to the entire system. In Python, you’ll have to use Gradient Tree Boosting (also known as GBRT). You can use it for classification as well as regression problems.
Here’s an example of Gradient Tree Boosting in Python:
from sklearn.ensemble import GradientBoostingRegressor
model = GradientBoostingRegressor(n_estimators=3,learning_rate=1)
# for classification
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier()
Features of Boosting in Machine Learning
Boosting offers many advantages, and like any other algorithm, it has its limitations as well:
- Interpreting the predictions of boosting is quite natural because it’s an ensemble model.
- It selects features implicitly, which is another advantage of this algorithm.
- The prediction power of boosting algorithms is more reliable than decision trees and bagging.
- Scaling it up is somewhat tricky because every estimator in boosting is based on the preceding estimators.
Also read: Machine Learning Project Ideas for Beginners
Where to go from here?
We hope you found this article on boosting useful. First, we discussed what this algorithm is and how it solves Machine Learning problems. Then we took a look at its operation and how it operates.
We also discussed its various types. We found out about AdaBoost and Gradient Boosting while sharing their examples as well. If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.
How can I define boosting in machine learning in simple terms?
Boosting in machines consists of referring to algorithms which help convert weak models of learning to strong models. If we take the example of classifying emails as spam and not spam, there are certain distinctions which can be used to make it easier to understand. These distinctions can be approached when an email has one single file, contains a similar phrase like You have won the lottery, contains a bunch of links, and is sourced from a contact list.
How does a boosting algorithm work?
Weak rules are identified by using instance-based learning algorithms. Once a base learning algorithm is applied in multiple iterations, it finally combines the weak rules into one strong rule. The boosting algorithm makes the right choices for distributing every iteration through multiple steps. After taking allocations, it assigns equal weight until an error is made, after which more weight is assigned. This process is repeated until better accuracy is achieved. Thereafter, all weak outputs are combined to make a strong one.
What are the different kinds of boosting algorithms and their features?
The different types are adaptive boosting, gradient boosting, and XGBoost. Boosting has characteristics like it selects features implicitly. Decision trees are less reliable than prediction powers. Also, scaling is tougher because estimators are based on preceding ones. And interpreting predictions of boost is natural as it is an ensemble model.