Table of Contents
Data science is all about experimenting with raw or structured data. Data is the fuel that can drive a business to the right path or at least provide actionable insights that can help strategize current campaigns, easily organize the launch of new products, or try out different experiments.
All these things have one common driving component and this is Data. We are entering into the digital era where we produce a lot of Data. For instance, a company like Flipkart produces more than 2TB of data on daily basis.
When this Data has so much importance in our life then it becomes important to properly store and process this without any error. When dealing with datasets, the category of data plays an important role to determine which preprocessing strategy would work for a particular set to get the right results or which type of statistical analysis should be applied for the best results. Let’s dive into some of the commonly used categories of data.
Qualitative Data Type
Qualitative or Categorical Data describes the object under consideration using a finite set of discrete classes. It means that this type of data can’t be counted or measured easily using numbers and therefore divided into categories. The gender of a person (male, female, or others) is a good example of this data type.
These are usually extracted from audio, images, or text medium. Another example can be of a smartphone brand that provides information about the current rating, the color of the phone, category of the phone, and so on. All this information can be categorized as Qualitative data. There are two subcategories under this:
These are the set of values that don’t possess a natural ordering. Let’s understand this with some examples. The color of a smartphone can be considered as a nominal data type as we can’t compare one color with others.
It is not possible to state that ‘Red’ is greater than ‘Blue’. The gender of a person is another one where we can’t differentiate between male, female, or others. Mobile phone categories whether it is midrange, budget segment, or premium smartphone is also nominal data type.
Read: Career in Data Science
These types of values have a natural ordering while maintaining their class of values. If we consider the size of a clothing brand then we can easily sort them according to their name tag in the order of small < medium < large. The grading system while marking candidates in a test can also be considered as an ordinal data type where A+ is definitely better than B grade.
These categories help us deciding which encoding strategy can be applied to which type of data. Data encoding for Qualitative data is important because machine learning models can’t handle these values directly and needed to be converted to numerical types as the models are mathematical in nature.
For nominal data type where there is no comparison among the categories, one-hot encoding can be applied which is similar to binary coding considering there are in less number and for the ordinal data type, label encoding can be applied which is a form of integer encoding.
Quantitative Data Type
This data type tries to quantify things and it does by considering numerical values that make it countable in nature. The price of a smartphone, discount offered, number of ratings on a product, the frequency of processor of a smartphone, or ram of that particular phone, all these things fall under the category of Quantitative data types.
The key thing is that there can be an infinite number of values a feature can take. For instance, the price of a smartphone can vary from x amount to any value and it can be further broken down based on fractional values. The two subcategories which describe them clearly are:
The numerical values which fall under are integers or whole numbers are placed under this category. The number of speakers in the phone, cameras, cores in the processor, the number of sims supported all these are some of the examples of the discrete data type.
The fractional numbers are considered as continuous values. These can take the form of the operating frequency of the processors, the android version of the phone, wifi frequency, temperature of the cores, and so on.
Must Read: Data Scientist Salary in India
Can Ordinal and Discrete type overlap?
If you pay attention to this, you can give numbering to the ordinal classes, and then it should be called discrete type or ordinal? The truth is that it is still ordinal. The reason for this is that even if the numbering is done, it doesn’t convey the actual distances between the classes.
For instance, consider the grading system of a test. The respective grades can be A, B, C, D, E, and if we number them from starting then it would be 1,2,3,4,5. Now according to the numerical differences, the distance between E grade and D grade is the same as the distance between the D and C grade which is not very accurate as we all know that C grade is still acceptable as compared to E grade but the mid difference declares them as equal.
You can also apply the same technique to a survey form where user experience is recorded on a scale of very poor to very good. The differences between various classes are not clear therefore can’t be quantified directly.
We have discussed all the major classifications of Data. This is important because now we can prioritize the tests to be performed on different categories. Now it makes sense to plot a histogram or frequency plot for quantitive data and a pie chart and bar plot for qualitative data.
Regression analysis, where the relationship between one dependent and two or more independent variables is analyzed is possible only for quantitative data. ANOVA test (Analysis of variance) test is applicable only on qualitative variables though you can apply two-way ANOVA test which uses one measurement variable and two nominal variables.
In this way, you can apply the Chi-square test on qualitative data to discover relationships between categorical variables.
In this article, we discussed how the data we produce can turn the tables upside down, how the various categories of data are arranged according to their need. We also looked at how ordinal data types can overlap with the discrete data types.
What type of plot is suitable for which category of data was also discussed along with various types of test that can be applied on specific data type and other tests that uses all types of data.
If you are curious about learning data science to be in the front of fast-paced technological advancements, check out upGrad & IIIT-B’s Advanced Certification in Data Science
Why is data science important?
The significance of data science lies in the fact that it brings together domain expertise in programming, mathematics, and statistics to generate new insights and make sense of large amounts of data. For companies, data science is a significant resource for making data-driven decisions since it describes the collecting, saving, sorting, and evaluating data. Highly experienced computer experts frequently employ it. When we ask ourselves why data science is essential, the answer rests because the value of data continues to increase. Data science is in great demand because it demonstrates how digital data alters organizations and enables them to make more informed and essential choices.
What is the scope of data science?
Data science can be found just about anywhere these days. That includes online transactions like Amazon purchases, social media feeds like Facebook/Instagram, Netflix recommendations, and even the finger and facial recognition capabilities given by smartphones. Data Science covers numerous cutting-edge technological ideas, such as Artificial Intelligence, the Internet of Things (IoT), and Deep Learning, to mention a few. Data science's effect has grown dramatically due to its advancements and technical advancements, expanding its scope. By learning Data science, you can choose your job profile from many options, and most of these jobs are well paying. A few of these job profiles are Data Analyst, Data Scientist, Data Engineer, Machine Learning Scientist and Engineer, Business Intelligence Developer, Data Architect, Statistician, etc.
How is nominal data different from ordinal data?
Nominal data includes names or characteristics that contain two or more categories, and the categories have no inherent ordering. In other words, these types of data don't have any natural ranking or order. An ordinal data type is similar to a nominal one, but the distinction between the two is an obvious ordering in the data. Overall, ordinal data have some order, but nominal data do not. All ranking data, such as the Likert scales, the Bristol stool scales, and any other scales rated between 0 and 10, can be expressed using ordinal data.