Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconData Sciencebreadcumb forward arrow iconApriori Algorithm: How Does it Work? How Brands Can Utilize Apriori Algorithm?

Apriori Algorithm: How Does it Work? How Brands Can Utilize Apriori Algorithm?

Last updated:
26th Mar, 2020
Views
Read Time
6 Mins
share image icon
In this article
Chevron in toc
View All
Apriori Algorithm: How Does it Work? How Brands Can Utilize Apriori Algorithm?

Imagine you’re at the supermarket, and in your mind, you have the items you wanted to buy. But you end up buying a lot more than you were supposed to. This is called impulsive buying and brands use the apriori algorithm to leverage this phenomenon. Click to learn more if you are interested to learn more about data science algorithms.

What is this algorithm? And how does it work? You’ll find the answers to these questions in this article. We’ll first take a look at what this algorithm is and then at how it works.

Let’s begin. 

What is the Apriori Algorithm?

The apriori algorithm gives you frequent itemsets. Its basis is the apriori property which we can explain in the following way:

Suppose an item set you have has a support value less than the necessary support value. Then, the subsets of this itemset would also have less support value than required. So, you won’t include them in your calculation and as a result, save a lot of space. 

Support value refers to the number of times a particular itemset appears in transactions. The apriori algorithm is quite popular due to its application in recommendation systems. Generally, you’ll apply this algorithm to transactional databases, which means, a database of transactions. There are many real-world applications of this algorithm as well. You should also make yourself familiar with Association Rule Mining to understand the apriori algorithm properly. 

Also read: Prerequisite for Data Science. How does it change over time?

How does the Apriori Algorithm Work?

The apriori algorithm generates association rules by using frequent itemsets. Its principle is simple – the subset of a frequent itemset would also be a frequent itemset. An itemset that has a support value greater than a threshold value is a frequent itemset. Consider the following data:

 

TIDItems
T11 3 4
T22 3 5
T31 2 3 5
T42 5
T51 3 5

 

In the first iteration, suppose the support value is two and make the itemsets with size 1. Now calculate their support values accordingly. We would discard the item which would have a support value lower than the minimum one. In this example, that would be item number four. 

C1 (Result of the first iteration)

ItemsetSupport
{1}3
{2}3
{3}4
{4}1
{5}4

 

F1 (After we discard {4})

ItemsetSupport
{1}3
{2}3
{3}4
{5}4

 

In the second iteration, we’ll keep the size of the itemsets two and then calculate the support values. We’ll use all the combinations of table F1 in this iteration. We’ll remove any itemsets that would have support values less than two. 

C2 (Only has items present in F1)

ItemsetSupport
{1,2}1
{1,3}3
{1,5}2
{2,3}2
{2,5}3
{3,5}3

 

F2 (After we remove items that have support values lower than 2)

 

ItemsetSupport
{1,3}3
{1,5}2
{2,3}2
{2,5}3
{3,5}3

 

Now, we’ll perform pruning. In this case, we’ll divide the itemsets of C3 into subsets and remove the ones that have a support value lower than two. 

C3 (After we perform pruning)

 

ItemsetIn F2?
{1,2,3}, {1,2}, {1,3}, {2,3}NO
{1,2,5}, {1,2}, {1,5}, {2,5}NO
{1,3,5}, {1,5}, {1,3}, {3,5}YES
{2,3,5}, {2,3}, {2,5}, {3,5}YES

 

In the third iteration, we’ll discard {1,2,5} and {1,2,3} as they both have {1,2}. This is the main impact of the apriori algorithm. 

F3 (After we discard {1,2,5} and {1,2,3})

 

ItemsetSupport
{1,3,5}2
{2,3,5}2

Explore our Popular Data Science Courses

In the fourth iteration, we’ll use the sets of F3 to create C4. however, as the support value of C4 is lower than 2, we wouldn’t proceed and the final itemset is F3. 

C3 

 

ItemsetSupport
{1,2,3,5}1

 

We’ve got the following itemsets with F3:

For I = {1,3,5}, the subsets we have are {5}, {3}, {1}, {3,5}, {1,5}, {1,3}

For I = {2,3,5}, the subsets we have are {5}, {3}, {2}, {3,5}, {2,5}, {2,3}

Explore our Popular Data Science Courses

Now, we’ll create and apply rules on the itemset F3. For that purpose, we’ll assume that the minimum confidence value is currently 60%. For subsets S of I, here’s the rule we output:

  • S -> (I,S) (this means S recommends I-S)
  • If support(I) / support(S) >= min_conf value

Let’s do this for the first subset we have, i.e., {1,3,5}

 

Rule no.1: {1,3} -> ({1,3,5} – {1,3}) this means 1 & 3-> 5

 

Confidence value = support value of (1,3,5) / support value of (1,3) = ⅔ = 66.66%

As the result is higher than 60%, we select Rule no.1.

 

Rule no.2: {1,5} -> {(1,3,5) – {1,5}) this means 1 & 5 -> 3

 

Confidence value = support value of (1,3,5) / support value of (1,5) = 2/2 = 100%

As the result is higher than 60%, we select Rule no.2.

 

Rule no.3: {3} -> ({1,3,5} – {3}) this means 3 -> 1 & 5

 

Confidence value = support value of (1,3,5) / support value of (3) = 2/4 = 50%

As the result is lower than 60%, we reject Rule no.3.

upGrad’s Exclusive Data Science Webinar for you –

Watch our Webinar on The Future of Consumer Data in an Open Data Economy

 

Earn data science courses from the World’s top Universities. Join our Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.

Read our popular Data Science Articles

With the example above, you can see how the Apriori algorithm creates and applies rules. You can follow these steps for the second item set ({2,3,5}) we have. Trying it out will surely give you a great experience in understanding what rules the algorithm accepts and which ones it rejects. The algorithm remains the same in other places such as the Apriori algorithm Python. 

Top Data Science Skills to Learn

Conclusion

After reading this article, we’re sure that you’d be quite familiar with this algorithm and its application. Due to its use in recommendation systems, it has become quite popular as well. 

Profile

Rohit Sharma

Blog Author
Rohit Sharma is the Program Director for the UpGrad-IIIT Bangalore, PG Diploma Data Analytics Program.

Frequently Asked Questions (FAQs)

1Is there a more efficient algorithm than the Apriori algorithm?

The ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is found to be a pretty useful and popular one for association rule mining. On top of that, it is also known to be a more efficient and faster algorithm as compared to the Apriori algorithm.

The Apriori algorithm works in a horizontal manner as it imitates the Breadth-First Search of a Graph, while the ECLAT algorithm works in a vertical manner by imitating the Depth-First Search of a Graph. This vertical approach is the reason behind the faster speed and better efficiency of the ECLAT algorithm as compared to the Apriori algorithm.

2Apriori algorithm is useful for what purpose?

Apriori algorithm is a classic algorithm that is widely used in data mining. It is really useful for mining relevant association rules and also frequent itemsets from the available database. Usually, this algorithm is utilized by organizations that have to handle a database consisting of plenty of transactions. For instance, the apriori algorithm makes it pretty easy to determine the items that customers frequently buy from your store. The market sales can be highly improved with the help of this algorithm.

Other than that, this algorithm is also utilized in the healthcare sector for detecting adverse drug reactions. The algorithm produces association rules to determine all the combinations of patient characteristics and medications that could lead to adverse drug reactions.

3What are the pros and cons of the Apriori algorithm?

Apriori algorithm is pretty easy to implement, understand and can be used very efficiently on large itemsets. Sometimes, there might be a need to find a large number of candidate rules, and this process could be a bit computationally expensive. As it has to go through the entire database, it is also expensive to calculate the support.

Explore Free Courses

Suggested Blogs

Priority Queue in Data Structure: Characteristics, Types & Implementation
57467
Introduction The priority queue in the data structure is an extension of the “normal” queue. It is an abstract data type that contains a
Read More

by Rohit Sharma

15 Jul 2024

An Overview of Association Rule Mining & its Applications
142458
Association Rule Mining in data mining, as the name suggests, involves discovering relationships between seemingly independent relational databases or
Read More

by Abhinav Rai

13 Jul 2024

Data Mining Techniques & Tools: Types of Data, Methods, Applications [With Examples]
101684
Why data mining techniques are important like never before? Businesses these days are collecting data at a very striking rate. The sources of this eno
Read More

by Rohit Sharma

12 Jul 2024

17 Must Read Pandas Interview Questions & Answers [For Freshers & Experienced]
58115
Pandas is a BSD-licensed and open-source Python library offering high-performance, easy-to-use data structures, and data analysis tools. The full form
Read More

by Rohit Sharma

11 Jul 2024

Top 7 Data Types of Python | Python Data Types
99373
Data types are an essential concept in the python programming language. In Python, every value has its own python data type. The classification of dat
Read More

by Rohit Sharma

11 Jul 2024

What is Decision Tree in Data Mining? Types, Real World Examples & Applications
16859
Introduction to Data Mining In its raw form, data requires efficient processing to transform into valuable information. Predicting outcomes hinges on
Read More

by Rohit Sharma

04 Jul 2024

6 Phases of Data Analytics Lifecycle Every Data Analyst Should Know About
82805
What is a Data Analytics Lifecycle? Data is crucial in today’s digital world. As it gets created, consumed, tested, processed, and reused, data goes
Read More

by Rohit Sharma

04 Jul 2024

Most Common Binary Tree Interview Questions & Answers [For Freshers & Experienced]
10471
Introduction Data structures are one of the most fundamental concepts in object-oriented programming. To explain it simply, a data structure is a par
Read More

by Rohit Sharma

03 Jul 2024

Data Science Vs Data Analytics: Difference Between Data Science and Data Analytics
70271
Summary: In this article, you will learn, Difference between Data Science and Data Analytics Job roles Skills Career perspectives Which one is right
Read More

by Rohit Sharma

02 Jul 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon