Table of Contents
Data mining has a vast application in big data to predict and characterize data. The function is to find trends in data science. Generally, data mining is categorized as:
- Descriptive data mining: It provides certain knowledge about the data, for instance, count, average. It gives information about what is happening inside the data without any previous idea. It exhibits the common features in the data. In simple words, you get to know the general properties of the data present in the database.
- Predictive data mining: This helps the developers in understanding the characteristics that are not explicitly available. For instance, the prediction of business analysis in the next quarter with the performance of the previous quarters. In general, the predictive analysis predicts or infers the characteristics with the previously available data.
The functionality of data mining is listed below
- Class/Concept Description: Characterization and Discrimination
- Association Analysis
- Cluster Analysis
- Outlier Analysis
- Evolution & Deviation Analysis
1. Class/Concept Description: Characterization and Discrimination
Data is associated with classes or concepts so they can be correlated with results. For example, the new iPhone model is released on three variants to attend to the targeted customers based on their requirements like Pro, Pro max, and Plus.
When you summarize the general features of the data, it is called data characterization. It produces the characteristic rules for the target class, like our iPhone buyers. We can collect the data using simple SQL queries and perform OLAP functions to generalize the data.
Attribute- oriented induction technique is also used to generalize or characterize the data with minimal user interaction. The generalized data is presented in various forms like tables, pie charts, line charts, bar charts, and graphs. The multi-dimensional relationship between the data is presented in a rule called characteristics rule of the target class.
It compares the data between the two classes. Generally, it maps the target class with a predefined group or class. It compares and contrasts the characteristics of the class with the predefined class using a set of rules called discriminant rules. The methods used in data discrimination is similar to data characterisation.
It uses data models to predict the trends in data. For example, the spending chart our internet banking or mobile application shows based on our spend patterns. This is sometimes used to define our risk of getting a new loan.
It uses methods like IF-THEN, decision tree, mathematical formulae, or neural network to predict or analyse a model. It uses training data to produce new instances to compare with the one existing.
Read: Career in Data Science
Prediction finds the missing numeric values in the data. It uses regression analysis to find the unavailable data. If the class label is missing, then the prediction is done using classification. Prediction is popular because of its importance in business intelligence. There are two ways one can predict data:
- Predicting the unavailable or missing data using prediction analysis
- Predicting the class label using the previously built class model.
It is a forecasting technique that allows us to find value deep into the future. We need to have a huge data set of past values to predict future trends.
4. Association Analysis
It relates two or more attributes of the data. It discovers the relationship between the data and the rules that are binding them. It finds its application widely in retail sales. The suggestion that Amazon shows on the bottom, “Customers who bought this also bought..” is a real-time example of association analysis.
It associates attributes that are frequently transacted together. They find out what are called association rules and are widely used in market basket analysis. There are two items to associate the attributes. One is the confidence that says the probability of both associated together, and another is support, which tells past occurrence of associations.
For example, that is if mobile phones are bought with headphones: support is 2% and confidence is 40%. This means that 2% of the time that customers bought mobile phones with headphones. 40% of confidence is the probability of the same association happening again.
5. Cluster Analysis
Unsupervised classification is called cluster analysis. It is similar to the classification where the data are grouped. Unlike classification, in cluster analysis, the class label is unknown. Data are grouped based on clustering algorithms.
The objects that are similarly grouped under one cluster. There will be a huge difference between one cluster and the other. Grouping is done to maximizing the intraclass similarity and minimizing the intra class similarity. Clustering is applied in many fields like machine learning, image processing, pattern recognition, and bioinformatics.
6. Outlier Analysis
When data that cannot be grouped in any of the class appears, we use outlier analysis. There will be occurrences of data that will have different attributes to any of the other classes or general models. These outstanding data are called outliers. They are usually considered noise or exceptions, and the analysis of these outliers is called outlier mining.
These outliers may be valuable associations in many applications, although they are usually discarded as noise. They are also called exceptions or surprises, and it is significant in identifying them. The outliers are identified using statistical tests that find the probability. Other names for outliers are:
7. Evolution & Deviation Analysis
With evolution analysis, we get time-related clustering of data. We can find trends and changes in behavior over a period. We can find features like time-series data, periodicity, and similarity in trends with such distinct analysis.
Also Read: Data Scientist Salary in India
Holistically data mining and functionalities find many applications from space science to retail marketing.
If you are curious about learning data science to be in the front of fast-paced technological advancements, check out upGrad & IIIT-B’s Executive PG Programme in Data Science.