View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All

Measure of Central Tendency: Mean, Median, and Mode

Updated on 26/05/2025458 Views

Did You Know? The terms "mean," "median," and "mode" have been used for centuries in statistics. Francis Galton introduced the mean in the 19th century, while the mode dates back to the early 19th century as well. The median, however, was formalized later and gained popularity in the 20th century as a more reliable measure for skewed data.

The mean, median, and mode are key measures of central tendency that summarize data with a single value. The mean is the average, calculated by dividing the sum of all values by the total count. The median is the middle value when data is ordered, while the mode represents the most frequent value. 

Comprehending the meaning of mean, median, and mode is vital for understanding data distribution and providing insights into their characteristics. In this article, explore what mean, median, and mode are in measure of central tendency. We will also explore their applications, difference, and relation between them.

Take your data visualization skills to the next level with upGrad’s online data science courses. Build expertise in Python, Machine Learning, AI, Tableau, and SQL, while gaining hands-on experience to solve real-world challenges with confidence.

What is Measure of Central Tendency?

Measure of central tendency refers to a statistical measure that represents the center or typical value of a dataset. It provides an indication of where most data points lie within a distribution. The three most commonly used measures of central tendency are:

  1. Mean: The arithmetic average of all values in a dataset. It is calculated by summing all the data points and dividing by the number of data points.
  2. Median: The middle value in a dataset when the data points are arranged in ascending or descending order. If there is an even number of data points, the median is the average of the two middle values.
  3. Mode: The value that appears most frequently in a dataset. A dataset can have zero, one, or multiple modes.

Each of these measures offers a unique way to summarize data, and selecting the right one depends on the nature of your dataset and the analysis you're conducting.

Data analysts skilled in statistical concepts like mean, median, and mode are in great demand for turning raw data into actionable insights. To advance your career and bridge data analysis with machine learning and artificial intelligence, explore these top courses designed to sharpen your skills in AI and ML-driven data science.

What Is the Mean? Definition and Examples

The mean is one of the most commonly used measures of central tendency in statistics and machine learning. It represents the average value of a dataset and is calculated by summing all the values in the dataset and then dividing by the number of values. 

The mean provides a single value that summarizes the overall trend of the data, making it useful for understanding general patterns.

Formula for the Mean

The formula to calculate the mean of a dataset is:

Where:

  • 𝑋𝑖 represents each individual data point, 
  • 𝑛 is the total number of data points in the dataset.

Simple Numeric Example

Let’s consider a simple example with the following dataset: 

3, 7, 8, 5, 12

To calculate the mean:

  • Sum of the values: 3+7+8+5+12=35
  • Divide by the number of values: 35/5=7

So, the mean of this dataset is 7.

Sensitivity to Outliers

One important thing to note about the mean is that it is sensitive to outliers, extremely high or low values that differ significantly from the other data points. An outlier can heavily distort the mean, making it unrepresentative of the data.

Example with an Outlier

Consider this new dataset:

3, 7, 8, 5, 100

  • Sum of the values: 3+7+8+5+100=123
  • Divide by the number of values: 123/5=24.6

In this case, the mean is 24.6, which is much higher than the actual center of the majority of the data of around 3 or 8. The outlier 100 has pulled the mean upward, making it less reflective of the overall trend of the data.

Also Read: Machine Learning Datasets Project Ideas for Beginners: Real-World Projects to Build Your Portfolio

What Is the Median? When Can You Apply It?

The median is a measure of central tendency, which represents the middle value in a dataset when the values are arranged in ascending or descending order. Unlike the mean, which is sensitive to extreme values, the median provides a more reliable measure of central tendency

It is especially useful when the data is skewed or contains outliers, as extreme values have less influence on it. The median is the middle value of a dataset. If the dataset contains an odd number of values, the median is the middle number. If the dataset contains an even number of values, the median is the average of the two middle numbers.

Formula for Median

1. Odd number of values: Median = Middle value of sorted data

2. Even number of values: Median = Middle two values summed/2

When to Use the Median?

1. Skewed Data

The median is particularly useful when dealing with skewed data. In cases where the data is heavily skewed, the mean may be pulled in the direction of the skew, due to the influence of outliers, while the median remains relatively unaffected. 

For example, income data, which can have a few extremely high values, is better summarized by the median because it more accurately reflects the typical income.

Example: Consider the following income data (in thousands):

30, 32, 34, 40, 1000

  • Mean: 30+32+34+40+1000/5=227.2
  • Median: 34 (Middle value)

The mean is heavily influenced by the outlier (1000), while the median remains at 34, representing a more accurate central tendency.

2. Presence of Outliers

When a dataset contains extreme values or outliers, the median provides a better measure of central tendency than the mean. Since the median focuses on the middle value, outliers do not impact its calculation as much.

Example: Let’s consider the dataset of house prices (in thousands):

150, 200, 250, 300, 10000

  • Mean: 150+200+250+300+10000/5=2580
  • Median: 250

Here, the mean is significantly higher due to the extreme outlier (10000), but the median remains at 250, providing a better representation of the typical house price.

3. ​Ordinal Data

When working with ordinal data, where the categories have a meaningful order but the distance between them is not fixed, the median is often used to identify the middle value or central tendency. For example, in surveys with rankings like "poor," "fair," "good," and "excellent," the median can help identify the most common or central rating.

4. Symmetric Data

While the median is most often applied in skewed distributions, it can also be used for symmetric datasets. However, in this case, the mean and median will likely be the same.

Numeric Example

Let’s look at a more detailed example to calculate the median:

  • Example 1 (Odd number of data points):

Dataset: 4, 1, 7, 8, 3

Step 1: Sort the dataset: 1, 3, 4, 7, 8

Step 2: Find the middle value: The median is 4 (the third value).

  • Example 2 (Even number of data points):

Dataset: 5, 2, 9, 4

Step 1: Sort the dataset: 2, 4, 5, 9

Step 2: Average the two middle values: (4 + 5) / 2 = 4.5

The median is 4.5.

What Is the Mode? Explanation with Use Case

The mode is a statistical term that refers to the most frequently occurring value in a dataset. Unlike the mean and median, which are measures of central tendency based on arithmetic calculations, the mode is purely based on frequency. 

In simple terms, in numerical datasets with repeated values, mode helps highlight the most typical observation, especially when data isn’t symmetrically distributed.

Definition of Mode

Mathematically, the mode can be defined as: Mode = Value that appears most frequently in a dataset.

For example, in a dataset of test scores: [70, 85, 70, 90, 85, 70, 100], the mode is 70 because it appears three times, more than any other value.

Use Case of Mode in Categorical Data

The mode is particularly useful in categorical data where we want to know the most common category or class. Since categorical data does not have an inherent numerical value, calculating the mean or median isn’t meaningful. Instead, we look for the mode to find the most frequent category.

Example in Categorical Data

Imagine a survey asking people about their favorite color, with the following responses:

  • Red, Blue, Green, Blue, Red, Blue, Blue, Green, Red, Red

In this case, the mode is Blue, as it appears more frequently (four times) than any other color in the dataset.

Use Case of Mode in Numeric Data

Mode can also be applied to numerical data. Although mean and median are more commonly used for continuous data, mode can still provide valuable insights, especially when multiple occurrences of a specific number occur or when the data is multimodal (having more than one mode).

Example in Numeric Data

Consider the following set of numbers representing the number of books read by individuals in a year: [2, 5, 2, 4, 5, 5, 7, 2, 4].

Here, the mode is 5, as it appears three times, which is more frequent than any other number.

Special Cases of Mode

  • Unimodal: A dataset is said to be unimodal if it has only one value that occurs most frequently.
  • Bimodal: A dataset with two values occurring with the same highest frequency is bimodal.
    • Example: [1, 2, 2, 3, 3, 4] has two modes: 2 and 3.
  • Multimodal: If a dataset has more than two values with the highest frequency, it is multimodal.
    • Example: [1, 1, 2, 2, 3, 3, 4] has three modes: 1, 2, and 3.

Building strong data analysis skills is crucial for success, but taking your expertise to the next level can set you apart from the competition. upGrad’s Master’s Degree in Artificial Intelligence and Data Science course will arm you with the advanced knowledge and skills to drive AI innovation and transformation within your organization.

Also Read: Types of Probability Distribution [Explained with Examples]

Application of Mean Median Mode in Real Analysis

In this section of the article, we’ll explore the meaning and application of mean median mode in real analysis, their relationship with data skewness and variability, and their common uses across various real-world fields.

A. Economics

  • Mean: Used to calculate the average income, GDP per capita, or market averages to understand general economic performance.
  • Median: Commonly used to measure income distribution or household wealth because extreme wealth values have less influence on it.
  • Mode: Often used in market research to determine the most commonly purchased products or services.

B. Healthcare

  • Mean: Used in clinical trials to calculate average treatment effectiveness or average recovery times.
  • Median: Used in analyzing patient survival rates where the data may be skewed due to some patients living longer than others.
  • Mode: Used to track the most common symptom or diagnosis in medical research.

C. Marketing

  • Mean: Used to calculate average customer spending or average order value in an e-commerce setting.
  • Median: Often used in customer satisfaction surveys to determine the typical response (i.e., median score) to avoid skewed results caused by outliers.
  • Mode: Used to determine the most popular product or the most frequent customer feedback.

D. Sports Analytics

  • Mean: Used to calculate average player performance, like scoring averages or distance covered.
  • Median: Used to analyze player consistency over a season, where the median shows the middle performance value without being influenced by extreme outliers (e.g., one exceptional game).
  • Mode: Can be used to determine the most common injury or the most frequent tactic in a team.

E. Education

  • Mean: Commonly used to calculate average grades or test scores for students in a class.
  • Median: Used to understand class performance without the influence of exceptionally high or low scores.
  • Mode: Used to determine the most common answer to a multiple-choice question, which can highlight common misconceptions or popular topics in the curriculum.

Also Read: Introduction to Statistics and Data Analysis: A Comprehensive Guide for Beginners

What is the Difference Between Mean, Median, and Mode?

Understanding the differences and the meaning of mean median and mode is essential for accurate data analysis and making well-informed decisions. Let's take a look at the key differences between the mean, median, and mode:

Measure

Definition

How It's Calculated

Best Used When

Mean

The arithmetic average of a dataset. It is the sum of all values divided by the number of values.

Mean= ∑X/n​ (Sum of all data points divided by the number of data points)

Best used for normally distributed data. Sensitive to extreme values (outliers).

Median

The middle value in a dataset is the value that is in the middle when arranged in ascending or descending order.

If 𝑛 is odd, median = middle value. If 𝑛 is even, median = average of the two middle values.

Best used for skewed data or when there are outliers.

Mode

The most frequently occurring value in a dataset.

Identify the value that appears most often in the dataset.

Best used for categorical data or data with multiple frequent values.

Also Read: Getting Started with Data Exploration: A Beginner's Guide

What is the Relation Between Mean, Median, and Mode

In a Normal Distribution:

  • In a perfectly symmetrical distribution, the mean, median, and mode are all equal (i.e., Mean = Median = Mode). This is often observed in a normal (bell-shaped) distribution.
  • Example: For a perfectly normal distribution, if the mean is 50, then the median and mode will also be 50.

In a Skewed Distribution:

  • Positively Skewed (Right-Skewed) Distribution:
    • The mean is greater than the median, and the median is greater than the mode (i.e., Mode < Median < Mean)
    • Example: In a right-skewed distribution, the tail is on the right side, pulling the mean to the higher end.
  • Negatively Skewed (Left-Skewed) Distribution:
    • The mean is less than the median, and the median is less than the mode (i.e., Mean < Median < Mode).
    • Example: In a left-skewed distribution, the tail is on the left side, pulling the mean to the lower end.

In Bimodal or Multimodal Distributions:

  • If a dataset has two or more modes (bimodal or multimodal), the mode will indicate the most frequent values, while the mean and median might not accurately represent the "center" of the distribution.
  • Example: In a bimodal distribution, there could be two peaks, and the mode would represent the most frequent values in each peak.

Empirical Relationship Between Mean, Median, and Mode

For a moderately skewed distribution, the relationship between these measures of central tendency is given by:

Mode = 3 × Median − 2 × Mean

This formula is derived from Karl Pearson’s empirical relationship and is particularly useful when two of the three values (mean, median, mode) are known, allowing for the estimation of the third.

Common Misunderstandings About Mean, Median, and Mode

  1. Mean vs. Median vs. Mode in Skewed Distributions:
  • In positively skewed distributions (right skew), the mean is greater than the median, and the median is greater than the mode. The mean gets pulled to the right due to high-value outliers.
  • In negatively skewed distributions (left skew), the mean is less than the median, and the median is less than the mode.
  • In symmetrical distributions, the mean, median, and mode are all equal or very close to each other.
  1. Impact of Outliers:
  • Outliers greatly affect the mean, while the median is not. This is why the median is often used when there are extreme values in the dataset.
  • Outliers can also impact the mode, especially in datasets where one value occurs significantly more often than others.
  1. Mode for Continuous Data:
  • The mode is most commonly used with categorical data. Finding the mode for continuous data can be tricky since it is unlikely that the same value will repeat frequently. In such cases, it's better to use the mean or median.

Code Example: Comparing Mean, Median, and Mode

Let's implement a Python example to demonstrate how to calculate the mean, median, and mode of a dataset using Pandas.

import pandas as pd

# Sample dataset
data = [1, 2, 3, 4, 100]

# Create a DataFrame
df = pd.DataFrame(data, columns=["Values"])

# Calculate mean, median, and mode
mean_value = df["Values"].mean()
median_value = df["Values"].median()
mode_value = df["Values"].mode()[0]

# Display the results
print(f"Mean: {mean_value}")
print(f"Median: {median_value}")
print(f"Mode: {mode_value}")

Output:

Mean: 22.0

Median: 3.0

Mode: 1

Explanation:

  • The mean is 22, which is influenced by the outlier 100.
  • The median is 3, which represents the central value better than the mean in this case.
  • The mode is 1, as it is the first number in the dataset and appears only once in this case.

If you're looking to clear up any confusion around statistical measures like mean, median, and mode, upGrad’s free Excel for Data Analysis Course is the ideal place to start. This course will help you master concepts and strengthen your skills using pivot tables and formulas, with a certification that adds valuable credentials to your portfolio.

Also Read: Data Analysis Using Python [Everything You Need to Know]

Mean, Median, Mode: Which One to Use? Pop Quiz

Now that you've explored the mean median mode meaning, it’s time to test your understanding! This pop quiz will help you assess your knowledge of when and why to use each measure of central tendency in different scenarios.

Answer the following multiple-choice questions to see how well you grasp these fundamental concepts:

1. Which measure of central tendency is most affected by outliers?

A) Mean

B) Median

C) Mode

D) None of the above

2. Which of the following is the best measure of central tendency for skewed data?

A) Mean

B) Median

C) Mode

D) Standard deviation

3. Which measure of central tendency is used to calculate the typical value in a data set when all values are equally likely?

A) Mean

B) Median

C) Mode

D) Variance

4. In a perfectly symmetrical distribution, which measure of central tendency will be the same?

A) Mean and Median

B) Mean and Mode

C) Median and Mode

D) All three

5. Which measure of central tendency is most useful for categorical data?

A) Mean

B) Median

C) Mode

D) Range

6. What happens to the mean if an extremely high value is added to the dataset?

A) The mean decreases

B) The mean increases

C) The mean stays the same

D) It becomes the median

7. Which of the following is not a measure of central tendency?

A) Mean

B) Median

C) Mode

D) Variance

8. When is the mode a better measure of central tendency than the mean or median?

A) When the data is normally distributed

B) When the data is skewed

C) When the data has multiple peaks (bimodal or multimodal)

D) When there are outliers

9. What is the relationship between the mean, median, and mode in a perfectly symmetrical distribution?

A) Mean > Median > Mode

B) Mean < Median < Mode

C) Mean = Median = Mode

D) None of the above

10. Which measure of central tendency is best for determining the "typical" score in a set of exam results with extreme outliers?

A) Mean

B) Median

C) Mode

D) Standard deviation

Understand The Basics And Application of Data Analysis Through upGrad!

Understanding the mean median mode meaning is essential for effectively summarizing and analyzing data. These measures of central tendency offer key insights into datasets, whether simple or complex. Mastering their applications is crucial for statistics, machine learning, and data science.

To build these skills, upGrad’s data science and machine learning programs offer practical training, including R programming techniques. With a hands-on, project-based approach, upGrad’s courses will help you build a strong foundation in data analysis.

We've covered some top programs, but here are a few additional courses designed to refine your skills and accelerate your path to success:

If you're uncertain about which direction to take in your career, upGrad’s personalized career guidance can help you find the right path. Additionally, you can visit the nearest upGrad center to begin practical, hands-on training and kickstart your journey toward success!

FAQs

1. What is the difference between mean, median, and mode in terms of data distribution?

The mean is the average value of a dataset, calculated by summing all values and dividing by the count. The median is the middle value when data is sorted, offering a better measure when the dataset has outliers. The mode is the most frequently occurring value. While the mean is sensitive to extreme values, the median and mode provide more robust insights in skewed distributions.

2. Can mean, median, and mode be the same value in a dataset?

Yes, in perfectly symmetric and normal distributions, the mean, median, and mode can be equal. This typically occurs in ideal bell-shaped curves where data is evenly distributed around the center. However, in skewed or irregular datasets, these measures tend to differ, highlighting different aspects of the data's distribution.

3. How do outliers affect mean, median, and mode differently?

Outliers heavily influence the mean since it considers every value in the dataset. The median, being the middle value, is less affected by extreme values, making it more reliable for skewed data. The mode, representing the most common value, is generally unaffected by outliers unless they occur frequently.

4. In what situations is using the mode more informative than mean or median?

Mode is especially useful for categorical or nominal data where numerical averages don't make sense. It helps identify the most common category or value, such as the most popular product color or preferred customer choice. For numerical data, mode can highlight common repeated values but may not always be unique.

5. Why is median preferred over mean in income data analysis?

Income data is often skewed with some extremely high earners, which inflate the mean. The median provides a better measure of the 'typical' income because it represents the middle point, unaffected by extreme high or low values. This gives a more realistic picture of the overall population’s income level.

6. How can mean, median, and mode be calculated in R programming?

In R, the mean can be calculated using the mean() function, the median using median(), and the mode requires a custom function since R does not have a built-in mode function. These functions allow easy computation of central tendency measures for vectors or datasets, facilitating statistical analysis.

7. What are the limitations of using mean in real-world data analysis?

Mean assumes data is evenly distributed and can be misleading in the presence of skewness or outliers. It doesn't reflect the spread or the shape of the distribution and can be distorted by extreme values. Therefore, relying solely on the mean may lead to incorrect interpretations in many real-world scenarios.

8. How does the concept of mean, median, and mode apply in machine learning?

These measures help understand the distribution of features and target variables, guiding data preprocessing and feature engineering. For example, median imputation can handle missing values in skewed datasets, while mode can be used to fill missing categorical data. Understanding these concepts supports better model accuracy and robustness.

9. Can mean, median, and mode be used together to get a complete picture of data?

Yes, analyzing all three provides a comprehensive understanding of the data distribution. Mean offers the average tendency, median shows the central location resistant to outliers, and mode identifies the most frequent occurrences. Together, they reveal insights into data symmetry, skewness, and modality.

10. What types of data are best suited for each measure: mean, median, or mode?

Mean is best for interval or ratio data with symmetrical distributions. Median is preferred when data is skewed or contains outliers. Mode is ideal for nominal or categorical data where identifying the most common category is important. Selecting the right measure depends on the nature of the data and analysis goals.

11. How can knowledge of mean, median, and mode enhance data visualization?

Including these measures in visualizations like histograms or box plots helps highlight central tendencies and data distribution characteristics. For example, plotting median lines on boxplots emphasizes data skewness, while mode indicators on bar charts show popular categories. This enriches the interpretation and communication of data insights.

image
Join 10M+ Learners & Transform Your Career
Learn on a personalised AI-powered platform that offers best-in-class content, live sessions & mentorship from leading industry experts.
advertise-arrow

upGrad Learner Support

Talk to our experts. We are available 7 days a week, 9 AM to 12 AM (midnight)

text

Indian Nationals

1800 210 2020

text

Foreign Nationals

+918068792934

Disclaimer

1.The above statistics depend on various factors and individual results may vary. Past performance is no guarantee of future results.

2.The student assumes full responsibility for all expenses associated with visas, travel, & related costs. upGrad does not provide any a.