For working professionals
For fresh graduates
More
Did You Know? The terms "mean," "median," and "mode" have been used for centuries in statistics. Francis Galton introduced the mean in the 19th century, while the mode dates back to the early 19th century as well. The median, however, was formalized later and gained popularity in the 20th century as a more reliable measure for skewed data. |
The mean, median, and mode are key measures of central tendency that summarize data with a single value. The mean is the average, calculated by dividing the sum of all values by the total count. The median is the middle value when data is ordered, while the mode represents the most frequent value.
Comprehending the meaning of mean, median, and mode is vital for understanding data distribution and providing insights into their characteristics. In this article, explore what mean, median, and mode are in measure of central tendency. We will also explore their applications, difference, and relation between them.
Take your data visualization skills to the next level with upGrad’s online data science courses. Build expertise in Python, Machine Learning, AI, Tableau, and SQL, while gaining hands-on experience to solve real-world challenges with confidence.
Measure of central tendency refers to a statistical measure that represents the center or typical value of a dataset. It provides an indication of where most data points lie within a distribution. The three most commonly used measures of central tendency are:
Each of these measures offers a unique way to summarize data, and selecting the right one depends on the nature of your dataset and the analysis you're conducting.
Data analysts skilled in statistical concepts like mean, median, and mode are in great demand for turning raw data into actionable insights. To advance your career and bridge data analysis with machine learning and artificial intelligence, explore these top courses designed to sharpen your skills in AI and ML-driven data science.
The mean is one of the most commonly used measures of central tendency in statistics and machine learning. It represents the average value of a dataset and is calculated by summing all the values in the dataset and then dividing by the number of values.
The mean provides a single value that summarizes the overall trend of the data, making it useful for understanding general patterns.
Formula for the Mean
The formula to calculate the mean of a dataset is:
Where:
Simple Numeric Example
Let’s consider a simple example with the following dataset:
3, 7, 8, 5, 12
To calculate the mean:
So, the mean of this dataset is 7.
Sensitivity to Outliers
One important thing to note about the mean is that it is sensitive to outliers, extremely high or low values that differ significantly from the other data points. An outlier can heavily distort the mean, making it unrepresentative of the data.
Example with an Outlier
Consider this new dataset:
3, 7, 8, 5, 100
In this case, the mean is 24.6, which is much higher than the actual center of the majority of the data of around 3 or 8. The outlier 100 has pulled the mean upward, making it less reflective of the overall trend of the data.
Also Read: Machine Learning Datasets Project Ideas for Beginners: Real-World Projects to Build Your Portfolio
The median is a measure of central tendency, which represents the middle value in a dataset when the values are arranged in ascending or descending order. Unlike the mean, which is sensitive to extreme values, the median provides a more reliable measure of central tendency
It is especially useful when the data is skewed or contains outliers, as extreme values have less influence on it. The median is the middle value of a dataset. If the dataset contains an odd number of values, the median is the middle number. If the dataset contains an even number of values, the median is the average of the two middle numbers.
Formula for Median
1. Odd number of values: Median = Middle value of sorted data
2. Even number of values: Median = Middle two values summed/2
When to Use the Median?
1. Skewed Data
The median is particularly useful when dealing with skewed data. In cases where the data is heavily skewed, the mean may be pulled in the direction of the skew, due to the influence of outliers, while the median remains relatively unaffected.
For example, income data, which can have a few extremely high values, is better summarized by the median because it more accurately reflects the typical income.
Example: Consider the following income data (in thousands):
30, 32, 34, 40, 1000
The mean is heavily influenced by the outlier (1000), while the median remains at 34, representing a more accurate central tendency.
2. Presence of Outliers
When a dataset contains extreme values or outliers, the median provides a better measure of central tendency than the mean. Since the median focuses on the middle value, outliers do not impact its calculation as much.
Example: Let’s consider the dataset of house prices (in thousands):
150, 200, 250, 300, 10000
Here, the mean is significantly higher due to the extreme outlier (10000), but the median remains at 250, providing a better representation of the typical house price.
3. Ordinal Data
When working with ordinal data, where the categories have a meaningful order but the distance between them is not fixed, the median is often used to identify the middle value or central tendency. For example, in surveys with rankings like "poor," "fair," "good," and "excellent," the median can help identify the most common or central rating.
4. Symmetric Data
While the median is most often applied in skewed distributions, it can also be used for symmetric datasets. However, in this case, the mean and median will likely be the same.
Numeric Example
Let’s look at a more detailed example to calculate the median:
Dataset: 4, 1, 7, 8, 3
Step 1: Sort the dataset: 1, 3, 4, 7, 8
Step 2: Find the middle value: The median is 4 (the third value).
Dataset: 5, 2, 9, 4
Step 1: Sort the dataset: 2, 4, 5, 9
Step 2: Average the two middle values: (4 + 5) / 2 = 4.5
The median is 4.5.
The mode is a statistical term that refers to the most frequently occurring value in a dataset. Unlike the mean and median, which are measures of central tendency based on arithmetic calculations, the mode is purely based on frequency.
In simple terms, in numerical datasets with repeated values, mode helps highlight the most typical observation, especially when data isn’t symmetrically distributed.
Definition of Mode
Mathematically, the mode can be defined as: Mode = Value that appears most frequently in a dataset.
For example, in a dataset of test scores: [70, 85, 70, 90, 85, 70, 100], the mode is 70 because it appears three times, more than any other value.
Use Case of Mode in Categorical Data
The mode is particularly useful in categorical data where we want to know the most common category or class. Since categorical data does not have an inherent numerical value, calculating the mean or median isn’t meaningful. Instead, we look for the mode to find the most frequent category.
Example in Categorical Data
Imagine a survey asking people about their favorite color, with the following responses:
In this case, the mode is Blue, as it appears more frequently (four times) than any other color in the dataset.
Use Case of Mode in Numeric Data
Mode can also be applied to numerical data. Although mean and median are more commonly used for continuous data, mode can still provide valuable insights, especially when multiple occurrences of a specific number occur or when the data is multimodal (having more than one mode).
Example in Numeric Data
Consider the following set of numbers representing the number of books read by individuals in a year: [2, 5, 2, 4, 5, 5, 7, 2, 4].
Here, the mode is 5, as it appears three times, which is more frequent than any other number.
Special Cases of Mode
Building strong data analysis skills is crucial for success, but taking your expertise to the next level can set you apart from the competition. upGrad’s Master’s Degree in Artificial Intelligence and Data Science course will arm you with the advanced knowledge and skills to drive AI innovation and transformation within your organization.
Also Read: Types of Probability Distribution [Explained with Examples]
In this section of the article, we’ll explore the meaning and application of mean median mode in real analysis, their relationship with data skewness and variability, and their common uses across various real-world fields.
A. Economics
B. Healthcare
C. Marketing
D. Sports Analytics
E. Education
Also Read: Introduction to Statistics and Data Analysis: A Comprehensive Guide for Beginners
Understanding the differences and the meaning of mean median and mode is essential for accurate data analysis and making well-informed decisions. Let's take a look at the key differences between the mean, median, and mode:
Measure | Definition | How It's Calculated | Best Used When |
---|---|---|---|
Mean | The arithmetic average of a dataset. It is the sum of all values divided by the number of values. | Mean= ∑X/n (Sum of all data points divided by the number of data points) | Best used for normally distributed data. Sensitive to extreme values (outliers). |
Median | The middle value in a dataset is the value that is in the middle when arranged in ascending or descending order. | If 𝑛 is odd, median = middle value. If 𝑛 is even, median = average of the two middle values. | Best used for skewed data or when there are outliers. |
Mode | The most frequently occurring value in a dataset. | Identify the value that appears most often in the dataset. | Best used for categorical data or data with multiple frequent values. |
Also Read: Getting Started with Data Exploration: A Beginner's Guide
For a moderately skewed distribution, the relationship between these measures of central tendency is given by:
Mode = 3 × Median − 2 × Mean
This formula is derived from Karl Pearson’s empirical relationship and is particularly useful when two of the three values (mean, median, mode) are known, allowing for the estimation of the third.
Code Example: Comparing Mean, Median, and Mode
Let's implement a Python example to demonstrate how to calculate the mean, median, and mode of a dataset using Pandas.
import pandas as pd
# Sample dataset
data = [1, 2, 3, 4, 100]
# Create a DataFrame
df = pd.DataFrame(data, columns=["Values"])
# Calculate mean, median, and mode
mean_value = df["Values"].mean()
median_value = df["Values"].median()
mode_value = df["Values"].mode()[0]
# Display the results
print(f"Mean: {mean_value}")
print(f"Median: {median_value}")
print(f"Mode: {mode_value}")
Output:
Mean: 22.0
Median: 3.0
Mode: 1
Explanation:
If you're looking to clear up any confusion around statistical measures like mean, median, and mode, upGrad’s free Excel for Data Analysis Course is the ideal place to start. This course will help you master concepts and strengthen your skills using pivot tables and formulas, with a certification that adds valuable credentials to your portfolio.
Also Read: Data Analysis Using Python [Everything You Need to Know]
Now that you've explored the mean median mode meaning, it’s time to test your understanding! This pop quiz will help you assess your knowledge of when and why to use each measure of central tendency in different scenarios.
Answer the following multiple-choice questions to see how well you grasp these fundamental concepts:
1. Which measure of central tendency is most affected by outliers?
A) Mean
B) Median
C) Mode
D) None of the above
2. Which of the following is the best measure of central tendency for skewed data?
A) Mean
B) Median
C) Mode
D) Standard deviation
3. Which measure of central tendency is used to calculate the typical value in a data set when all values are equally likely?
A) Mean
B) Median
C) Mode
D) Variance
4. In a perfectly symmetrical distribution, which measure of central tendency will be the same?
A) Mean and Median
B) Mean and Mode
C) Median and Mode
D) All three
5. Which measure of central tendency is most useful for categorical data?
A) Mean
B) Median
C) Mode
D) Range
6. What happens to the mean if an extremely high value is added to the dataset?
A) The mean decreases
B) The mean increases
C) The mean stays the same
D) It becomes the median
7. Which of the following is not a measure of central tendency?
A) Mean
B) Median
C) Mode
D) Variance
8. When is the mode a better measure of central tendency than the mean or median?
A) When the data is normally distributed
B) When the data is skewed
C) When the data has multiple peaks (bimodal or multimodal)
D) When there are outliers
9. What is the relationship between the mean, median, and mode in a perfectly symmetrical distribution?
A) Mean > Median > Mode
B) Mean < Median < Mode
C) Mean = Median = Mode
D) None of the above
10. Which measure of central tendency is best for determining the "typical" score in a set of exam results with extreme outliers?
A) Mean
B) Median
C) Mode
D) Standard deviation
Understanding the mean median mode meaning is essential for effectively summarizing and analyzing data. These measures of central tendency offer key insights into datasets, whether simple or complex. Mastering their applications is crucial for statistics, machine learning, and data science.
To build these skills, upGrad’s data science and machine learning programs offer practical training, including R programming techniques. With a hands-on, project-based approach, upGrad’s courses will help you build a strong foundation in data analysis.
We've covered some top programs, but here are a few additional courses designed to refine your skills and accelerate your path to success:
If you're uncertain about which direction to take in your career, upGrad’s personalized career guidance can help you find the right path. Additionally, you can visit the nearest upGrad center to begin practical, hands-on training and kickstart your journey toward success!
The mean is the average value of a dataset, calculated by summing all values and dividing by the count. The median is the middle value when data is sorted, offering a better measure when the dataset has outliers. The mode is the most frequently occurring value. While the mean is sensitive to extreme values, the median and mode provide more robust insights in skewed distributions.
Yes, in perfectly symmetric and normal distributions, the mean, median, and mode can be equal. This typically occurs in ideal bell-shaped curves where data is evenly distributed around the center. However, in skewed or irregular datasets, these measures tend to differ, highlighting different aspects of the data's distribution.
Outliers heavily influence the mean since it considers every value in the dataset. The median, being the middle value, is less affected by extreme values, making it more reliable for skewed data. The mode, representing the most common value, is generally unaffected by outliers unless they occur frequently.
Mode is especially useful for categorical or nominal data where numerical averages don't make sense. It helps identify the most common category or value, such as the most popular product color or preferred customer choice. For numerical data, mode can highlight common repeated values but may not always be unique.
Income data is often skewed with some extremely high earners, which inflate the mean. The median provides a better measure of the 'typical' income because it represents the middle point, unaffected by extreme high or low values. This gives a more realistic picture of the overall population’s income level.
In R, the mean can be calculated using the mean() function, the median using median(), and the mode requires a custom function since R does not have a built-in mode function. These functions allow easy computation of central tendency measures for vectors or datasets, facilitating statistical analysis.
Mean assumes data is evenly distributed and can be misleading in the presence of skewness or outliers. It doesn't reflect the spread or the shape of the distribution and can be distorted by extreme values. Therefore, relying solely on the mean may lead to incorrect interpretations in many real-world scenarios.
These measures help understand the distribution of features and target variables, guiding data preprocessing and feature engineering. For example, median imputation can handle missing values in skewed datasets, while mode can be used to fill missing categorical data. Understanding these concepts supports better model accuracy and robustness.
Yes, analyzing all three provides a comprehensive understanding of the data distribution. Mean offers the average tendency, median shows the central location resistant to outliers, and mode identifies the most frequent occurrences. Together, they reveal insights into data symmetry, skewness, and modality.
Mean is best for interval or ratio data with symmetrical distributions. Median is preferred when data is skewed or contains outliers. Mode is ideal for nominal or categorical data where identifying the most common category is important. Selecting the right measure depends on the nature of the data and analysis goals.
Including these measures in visualizations like histograms or box plots helps highlight central tendencies and data distribution characteristics. For example, plotting median lines on boxplots emphasizes data skewness, while mode indicators on bar charts show popular categories. This enriches the interpretation and communication of data insights.
Author|13 articles published
Talk to our experts. We are available 7 days a week, 9 AM to 12 AM (midnight)
Indian Nationals
1800 210 2020
Foreign Nationals
+918068792934
1.The above statistics depend on various factors and individual results may vary. Past performance is no guarantee of future results.
2.The student assumes full responsibility for all expenses associated with visas, travel, & related costs. upGrad does not provide any a.
Recommended Programs