Home
Blog
Data Science
DataFrames in Python: Why Every Data Scientist Is Obsessed!

DataFrames in Python: Why Every Data Scientist Is Obsessed!

Q: 1. How do I concatenate two DataFrames in Python?

You can concatenate two DataFrames in Python using the concat() function from Pandas. This allows you to stack DataFrames either vertically (along rows) or horizontally (along columns). For example, pd.concat([df1, df2], axis=0) combines df1 and df2 vertically, while axis=1 would stack them side by side horizontally. The ignore_index=True parameter resets the index in the concatenated DataFrame.

Q: 2. How do I reset the index of a DataFrame in Python?

You can reset the index of a DataFrame in Python using the reset_index() method. This method moves the current index to a column and creates a new default integer index. For example, df.reset_index(drop=True, inplace=True) resets the index without adding the old index as a column. This is useful when you’ve made modifications to the DataFrame and need a clean, sequential index.

Q: 3. Can DataFrames in Python handle missing values?

Yes, DataFrames in Python provide robust tools for handling missing data. You can use the isnull() method to check for missing values and dropna() to remove them. Alternatively, you can use fillna() to replace missing values with a specified value, such as the mean or median of the column. These functions make working with incomplete data much easier.

Q: 4. How do I filter rows based on multiple conditions in a DataFrame in Python?

To filter rows based on multiple conditions, you can use the & (and) or | (or) operators, combining conditions in a single statement. For example, df[(df['age'] > 30) & (df['country'] == 'USA')] filters the DataFrame to show rows where the 'age' is greater than 30 and the 'country' is 'USA'. Make sure to wrap each condition in parentheses when using these operators.

Q: 5. How do I rename columns in a DataFrame in Python?

You can rename columns in a DataFrame in Python using the rename() method. For example, df.rename(columns={'old_name': 'new_name'}, inplace=True) changes the column name 'old_name' to 'new_name'. You can also rename multiple columns by passing a dictionary with old column names as keys and new names as values.

Q: 6. Can DataFrames in Python handle large datasets efficiently?

Yes, DataFrames in Python are designed to handle large datasets efficiently. They optimize memory usage and perform operations such as filtering, grouping, and aggregation efficiently. The chunking feature in Pandas allows you to process large datasets by reading them in smaller parts, making it easier to manage large volumes of data.

Q: 7. How do I calculate summary statistics for a DataFrame in Python?

You can calculate summary statistics for a DataFrame in Python using the describe() method. This method provides statistics like mean, median, standard deviation, min, max, and quartiles for numeric columns. For example, df.describe() returns a summary of statistics for all numerical columns in the DataFrame.

Q: 8. How do I apply a function to a DataFrame in Python?

You can apply a function to a DataFrame in Python using the apply() method. This allows you to apply a custom function to each element or along a specific axis (rows or columns). For example, df['age'].apply(lambda x: x + 1) adds 1 to each value in the 'age' column. The apply() method is flexible and works well for custom operations.

Q: 9. How do I merge two DataFrames in Python?

You can merge two DataFrames in Python using the merge() function. This is similar to SQL joins and allows you to combine DataFrames based on a common column or index. You can specify the type of join (inner, outer, left, or right) depending on how you want to combine the data. For example, pd.merge(df1, df2, on='id', how='inner') merges two DataFrames on the 'id' column using an inner join.

Q: 10. Can I group data in a DataFrame in Python?

Yes, you can group data in a DataFrame in Python using the groupby() method. This method allows you to group data by one or more columns and perform aggregation functions like sum, mean, count, etc. For example, df.groupby('country')['age'].mean() groups the data by 'country' and calculates the mean age for each country.

By Rohit Sharma

Updated on Jul 08, 2025 | 21 min read | 7.99K+ views

Table of Contents

View all

DataFrames in Python: 6 Ways to Create Them with Pandas
DataFrame Basics: Viewing, Accessing, and Filtering Data
DataFrame Operations: Data Manipulation and Merging
How to Implement Advanced DataFrame Techniques in Python?
Enhance Your Python Skills with upGrad!

Did you know? Python’s popularity increased by 2.2% from April to May 2025, surpassing competitors like C++, C, and Java! This growing demand highlights the increasing reliance on Python’s powerful features, like DataFrames, making data analysis faster and more intuitive than ever.

DataFrames in Python are two-dimensional, size-mutable, and labeled data structures provided by the Pandas library. They store data in rows and columns, similar to tables in databases or spreadsheets, allowing efficient data manipulation and analysis.

DataFrames can hold various data types and support operations like filtering, grouping, and aggregating, making them indispensable in data science.

In this blog, you’ll learn about DataFrames in Python, focusing on their creation, manipulation, and advanced usage for practical analysis.

Interested in learning more about DataFrames in Python? Enrol in upGrad’s Online Software Development Courses, featuring an updated curriculum on generative AI and specializations like full-stack development.

DataFrames in Python: 6 Ways to Create Them with Pandas

A Pandas DataFrame is a 2D labeled structure that can store data of different types (e.g., integers, floats, strings) across rows and columns. It is similar to a table in a database or an Excel spreadsheet, with labeled axes (rows and columns). The main components of a DataFrame are:

Rows: These represent individual data points or records.
Columns: These define the data categories, such as "name" or "age.
Data: The actual values in the DataFrame, which can include numbers, text, or other types.

Understanding concepts like DataFrames is just the beginning. To advance in Python and build a successful tech career, continuous learning is essential. Here are some relevant courses that can help you in your learning journey:

Understanding DataFrames is an essential step in working with data in Python. To start, let’s ensure that Pandas is installed:

pip install pandas

Once installed, import Pandas into your Python script:

import pandas as pd

Let’s now look at how to create DataFrames in Python using Pandas, using sources such as dictionaries, lists, or external files like CSV and Excel.

1. From a Dictionary

Creating a DataFrame from a dictionary maps the dictionary keys to column labels while the corresponding values fill the columns. This approach is beneficial when dealing with structured data, as it allows direct mapping of data attributes (keys) to columns.

Code Example:

import pandas as pd

# Create a DataFrame from a dictionary
data = {'Name': ['Aman', 'Bhoomi', 'Chetan'], 'Age': [24, 27, 22]}
df = pd.DataFrame(data)

print(df)

Explanation:

A dictionary is created where the keys ('Name', 'Age') represent the column labels, and the values (lists of names and ages) represent the data to be populated under each column.
pd.DataFrame(data) takes the dictionary and converts it into a DataFrame.

Output:

The Name and Age columns are populated based on the dictionary values, and each list entry corresponds to a row.
Row indices (0, 1, 2) are generated automatically by Pandas.

Name Age
0 Aman 24
1 Bhoomi 27
2 Chetan 22

Also Read: Python Challenges for Beginners

2. From a List of Lists

Using a list of lists allows you to create a DataFrame where each inner list represents a row. Since the list of lists does not contain column labels, the columns parameter must be specified separately to define the DataFrame's structure.

Code Example:

import pandas as pd

# Create a DataFrame from a list of lists
data = [['Aman', 24], ['Bhoomi', 27], ['Chetan', 22]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

print(df)

Explanation:

The data is structured as a list of lists, where each sublist represents a row in the DataFrame.
The columns parameter explicitly defines the labels for the columns, in this case, Name and Age.

Output:

The DataFrame organizes the Name and Age labels based on the inner lists provided.
Row indices (0, 1, 2) are automatically assigned by Pandas, which makes the rows identifiable and easily accessible.

Name Age
0 Aman 24
1 Bhoomi 27
2 Chetan 22

Get a better understanding of Python with upGrad’s Learn Python Libraries: NumPy, Matplotlib & Pandas. Learn how to manipulate data using NumPy, visualize insights with Matplotlib, and analyze datasets with Pandas.

Also Read: A Comprehensive Guide to Pandas DataFrame astype()

3. From External Files (CSV, Excel, etc.)

Pandas allows you to read data directly from external files, such as CSV or Excel, into a DataFrame. This is particularly useful when working with large datasets stored externally.

Code Example:

import pandas as pd

# Create a DataFrame from a CSV file
df = pd.read_csv('data.csv')

print(df)

Explanation:

pd.read_csv('data.csv') reads the contents of the specified CSV file (data.csv) and loads it into a Pandas DataFrame.
Ensure the CSV file (data.csv) is in the same directory as the script, or provide the full file path if the file is located elsewhere.

Output (Assuming the CSV contains columns Name, Age, Country):

The columns and rows are automatically populated from the content of the CSV file.
Row indices (0, 1, and 2) are generated automatically by Pandas based on the data.
The columns in the CSV file (e.g., Name, Age, Country) become the column labels in the DataFrame.

Name Age Country
0 Aman 24 USA
1 Bhoomi 27 UK
2 Chetan 22 Canada

Note:

Excel Files: For Excel files, you would use pd.read_excel('data.xlsx') instead of pd.read_csv().
File Path: If the file is in a different directory, ensure that the full path to the file is provided, e.g., pd.read_csv('/path/to/your/data.csv').

Also Read: Career Opportunities in Python: Everything You Need To Know [2025]

4. From NumPy Arrays

You can create a DataFrame from a NumPy array, where each row in the array becomes a row in the DataFrame. Columns must be defined explicitly, and this method is beneficial when working with numerical data.

Code Example:

import pandas as pd
import numpy as np

# Create a DataFrame from a NumPy array
data = np.array([[1, 2], [3, 4], [5, 6]])
df = pd.DataFrame(data, columns=['Column1', 'Column2'])

print(df)

Explanation:

A NumPy array is created with shape (3, 2), which corresponds to three rows and two columns.
pd.DataFrame(data, columns=['Column1', 'Column2']) converts the NumPy array into a Pandas DataFrame.
The columns parameter specifies the column labels ('Column1', 'Column2'), while the NumPy array fills the rows.

Output:

The DataFrame is created using the NumPy array as the data source, with each row from the array becoming a row in the DataFrame.
Pandas automatically assign row indices (0, 1, 2).
The specified column names ('Column1' and 'Column2') are used as headers for the DataFrame columns.

Column1 Column2
0 1 2
1 3 4
2 5 6

Also Read: Top 7 Data Types in Python: Examples, Differences, and Best Practices (2025)

5. From a List of Dictionaries

When data is structured as a list of dictionaries, each dictionary represents a row in the DataFrame, with the dictionary keys becoming the column labels.

Code Example:

import pandas as pd

# Create a DataFrame from a list of dictionaries
data = [{'Name': 'Aman', 'Age': 24}, {'Name': 'Bhoomi', 'Age': 27}]
df = pd.DataFrame(data)

print(df)

Explanation:

Each dictionary in the list represents a row in the DataFrame.
The keys in the dictionaries automatically become the column labels for the DataFrame.

Output:

Each dictionary is converted into a row, and the column labels are derived from the keys in the dictionaries.
Pandas assigns row indices (0, 1, 2) automatically.

Output:

Name Age
0 Aman 24
1 Bhoomi 27

Start your Python learning journey with upGrad’s Learn Basic Python Programming course! Build expertise in Python and Matplotlib through hands-on exercises. Ideal for beginners, plus earn a certification to advance your career upon completion!

Also Read: Inheritance in Python | Python Inheritance [With Example]

6. From a Pandas Series

You can create a DataFrame from a Pandas Series by passing the series into the DataFrame() constructor. The Series’ name attribute becomes the column label in the resulting DataFrame.

Code Example:

import pandas as pd

# Create a DataFrame from a Series
series = pd.Series([1, 2, 3], name='Numbers')
df = pd.DataFrame(series)

print(df)

Explanation:

A Pandas Series is passed directly into the DataFrame() constructor.
The name attribute of the Series (in this case, 'Numbers') becomes the column label for the DataFrame.

Output:

The Series is converted into a single-column DataFrame, with the name attribute used as the column label.
Pandas automatically generate row indices (0, 1, 2).

Numbers
0 1
1 2
2 3

In fields like ML, AI, and data analytics, DataFrames in Python play a crucial role in structuring data for model training and generating insights. With Python libraries like Pandas, data manipulation becomes efficient, simplifying tasks such as cleaning, transforming, and visualizing data.

Are you a full-stack developer wanting to integrate AI into your Python Coding? upGrad’s AI-Driven Full-Stack Development can help you. You’ll learn how to build AI-powered software using OpenAI, GitHub Copilot, Bolt AI & more.

Also Read: Top 36+ Python Projects for Beginners and Students to Explore in 2025

Let's explore key ways for inspecting and accessing data in pandas, enabling efficient exploration and extraction of insights from your DataFrames in Python.

DataFrame Basics: Viewing, Accessing, and Filtering Data

DataFrame Basics involves understanding how to view and access data within a Pandas DataFrame. This includes viewing the structure, selecting specific rows and columns, and filtering data to extract meaningful insights.

Below are the key techniques for viewing and accessing data in DataFrames:

1. Viewing Data

Once you load a DataFrame, inspecting the data is the first step. You can use several methods to view and understand the structure of the DataFrame.

(a) head(): Displays the first 5 rows by default, allowing you to get a quick look at the top of the data. You can specify a number to display a custom number of rows.

Code Example:

import pandas as pd

data = {'Name': ['Aman', 'Bhoomi', 'Chetan'], 'Age': [24, 27, 22]}
df = pd.DataFrame(data)

# Display the first five rows (default behavior)
print(df.head())

Explanation: df.head() displays the first five rows of the DataFrame. You can specify the number of rows to display by passing an argument (e.g., df.head(2) to show the first two rows).

Output: This will show the first five rows of the DataFrame. If there are fewer than five rows, all of them will be displayed.

Name Age
0 Aman 24
1 Bhoomi 27
2 Chetan 22

(b) tail(): Displays the last five rows of the DataFrame, offering a look at the bottom of the data.

Code Example:

print(df.tail())  # Displays the last 5 rows

Explanation: df.tail() returns the last five rows. This is useful for inspecting the bottom of your dataset.

Output: Since the DataFrame has only three rows, the output will display all of them.

Name Age
0 Aman 24
1 Bhoomi 27
2 Chetan 22

(c) info(): Provides summary information about the DataFrame, including the number of non-null entries and the data type of each column.

Code Example:

df.info()

Explanation: The df.info() function provides a concise summary of the DataFrame, including the number of non-null entries and the data types of each column.

Output: This method helps you understand the structure of the DataFrame, including the number of missing values and the type of data in each column.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 3 non-null object
1 Age 3 non-null int64
dtypes: int64(1), object(1)
memory usage: 143.0+ bytes

(d) describe(): Generates descriptive statistics for numerical columns, such as the mean, standard deviation, minimum, and maximum values.

Code Example:

print(df.describe())

Explanation: df.describe() returns statistical details like mean, standard deviation, and min/max values for numeric columns.

Output: It provides a quick summary of numeric data, including the count of non-null entries, the mean value, and the standard deviation.

Age
count 3.0
mean 24.3
std 2.52
min 22.0
25% 23.0
50% 24.0
75% 25.5
max 27.0

2. Accessing Data

You can access data from a DataFrame using the column name or by index, making it easy to retrieve specific parts of the dataset for further analysis.

(a) By Column Name: You can access a column in a DataFrame directly by specifying its name in square brackets. This allows you to isolate specific features of the data.

Code Example:

print(df['Name'])  # Access the 'Name' column

Explanation: Accessing a DataFrame column by its name returns a Series with data from that column. This is useful when you need to work with specific variables in your dataset.

Output: This will return the data from the Name column.

0 Aman
1 Bhoomi
2 Chetan
Name: Name, dtype: object

(b) By Row Index (Using iloc and loc): You can access rows by their index position using iloc (integer location-based indexing) or loc (label-based indexing). This helps to isolate specific rows based on their position or label.

Using iloc: Access a specific row by its index (integer-based). This is particularly useful when working with rows based on their positional location in the DataFrame.

Code Example:

# Using iloc to access a specific row by index
print(df.iloc[0])  # Accesses the first row (index 0)

Explanation: iloc[0] accesses the first row (index 0) of the DataFrame, which is ideal when you need to retrieve data by row index.

Output: This function returns all data from the first row in a Series format.

Name Aman
Age 24
Name: 0, dtype: object

Using loc: loc accesses a specific row by its index label. This is helpful when working with DataFrames that have custom index labels rather than the default integer-based index.

Code Example:

print(df.loc[0])  # Accesses the row with index label 0

Explanation: loc[0] accesses the row with the index label 0, which is the default integer index.

Output: Similar to iloc, but loc can also be used with custom index labels.

Name Aman
Age 24
Name: 0, dtype: object

3. Filtering Data

Pandas allows you to filter data based on specific conditions, enabling you to isolate rows that meet particular criteria. This functionality is essential for analyzing subsets of your data.

Code Example:

# Filter rows where the age is greater than 30
filtered_df = df[df['Age'] > 30]
print(filtered_df)

Explanation:

df['Age'] > 30 returns a Boolean series, which is used to filter rows based on the condition where the age is greater than 30.
The resulting DataFrame will only contain rows where the condition is true.

Output: Since none of the rows in this DataFrame have an age greater than 30, the output will be an empty DataFrame.

Empty DataFrame
Columns: [Name, Age]
Index: []

The basic DataFrame operations are essential for efficient data exploration and analysis. Gaining proficiency in these techniques allows you to interact with large datasets and make informed decisions based on the data.

Ready to advance your Python skills? Gain expertise in Linux, Python foundation, AWS, Azure, and Google Cloud to create scalable solutions with upGrad’s Expert Cloud Engineer Bootcamp. Start building your job-ready portfolio today!

Also Read: Mastering Python Variables: Complete Guide with Examples

Let's explore how Pandas makes it easy to manipulate and merge data using a variety of built-in DataFrame operations.

DataFrame Operations: Data Manipulation and Merging

DataFrame operations in Pandas enable powerful data manipulation, including data transformation, merging, and aggregation. These operations help to clean, reshape, and combine datasets for deeper analysis and more insightful results.

Below are some common DataFrame operations that help efficiently manipulate and merge datasets for better analysis.

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

1. Adding and Dropping Columns

Adding or dropping columns is a common operation for modifying the structure of a DataFrame. This is useful when you need to either expand your dataset with new data or remove unnecessary columns for focused analysis.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Aman', 'Bhoomi', 'Chetan'],
    'Age': [24, 27, 22]
})

# Adding a new column
df['Salary'] = [50000, 60000, 70000]
print(df)

# Dropping the 'Salary' column
df = df.drop('Salary', axis=1)
print(df)

Explanation:

A new column is added by assigning a list or Series to the new column name (df['Salary']).
drop() removes a column from the DataFrame, specified by axis=1.

Output:

The first output shows the new Salary column.

Name Age Salary
0 Aman 24 50000
1 Bhoomi 27 60000
2 Chetan 22 70000

The second output displays the DataFrame after dropping the 'Salary' column.

Name Age
0 Aman 24
1 Bhoomi 27
2 Chetan 22

2. Renaming Columns

Renaming columns allows you to update column names to be more descriptive or standardized. This operation is crucial when you are cleaning data or preparing it for analysis.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Aman', 'Bhoomi', 'Chetan'],
    'Age': [24, 27, 22]
})

# Renaming the column 'Name' to 'Full Name'
df = df.rename(columns={'Name': 'Full Name'})
print(df)

Explanation: The rename() function is used to change column names by passing a dictionary where the keys are the old names and the values are the new names.

Output: The Name column is renamed to Full Name, and the updated DataFrame is printed.

Full Name Age
0 Aman 24
1 Bhoomi 27
2 Chetan 22

3. Sorting Data

Sorting data helps organize the DataFrame based on specific criteria. Sorting by one or more columns enables easier data analysis, particularly when searching for trends or arranging data in a logical order.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Aman', 'Bhoomi', 'Chetan'],
    'Age': [24, 27, 22]
})

# Sorting the DataFrame by the 'Age' column in descending order
df = df.sort_values(by='Age', ascending=False)
print(df)

Explanation: The sort_values() function sorts the DataFrame by a specified column, with the ascending=False parameter used to sort in descending order.

Output: The rows are sorted by the Age column in descending order.

Name Age
1 Bhoomi 27
0 Aman 24
2 Chetan 22

4. Handling Missing Data

Handling missing data is essential to ensure data integrity and avoid errors during analysis. Pandas provides multiple methods, such as fillna() and dropna(), to manage missing values effectively.

Code Example:

import pandas as pd

# Create a DataFrame with missing values
df = pd.DataFrame({
    'Name': ['Aman', 'Bhoomi', None],
    'Age': [24, None, 22]
})

# Identify missing values
print(df.isnull())

# Fill missing values in the 'Age' column with the mean
df['Age'] = df['Age'].fillna(df['Age'].mean())
print(df)

# Drop rows with any missing values
df = df.dropna()
print(df)

Explanation:

isnull() identifies missing data.
fillna() is used to fill missing data with a specific value, like the mean of the Age column.
dropna() removes rows that contain missing values.

Output:

The first output indicates where the missing values are located.

Name Age
0 False False
1 False True
2 True False

The second output fills the missing Age value with the mean of the column.

Name Age
0 Aman 24
1 Bhoomi 25.5
2 Chetan 22

The third output drops rows with any missing values.

Name Age
0 Aman 24.0
2 Chetan 22.0

5. Grouping Data

Grouping data is valuable when you want to aggregate information based on a specific feature. The groupby() method allows you to group data by a column and apply aggregation functions such as mean(), sum(), or count().

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'City': ['New York', 'London', 'New York', 'London'],
    'Age': [24, 27, 22, 30]
})

# Group by 'City' and calculate the mean of 'Age'
grouped_df = df.groupby('City').mean()
print(grouped_df)

Explanation: The groupby() function groups the data by the City column, and mean() function calculates the average value for the Age column in each group.

Output: The DataFrame is grouped by City, and the mean age for each city is calculated.

Age
City
London 28.500000
New York 23.000000

6. Merging DataFrames

Merging DataFrames is a common operation when you need to combine two datasets based on a common column or index. This operation is similar to SQL joins and is used to combine related data from different sources.

Code Example:

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Aman', 'Bhoomi', 'Chetan']
})

df2 = pd.DataFrame({
    'ID': [1, 2, 4],
    'Salary': [50000, 60000, 70000]
})

# Merge the DataFrames on the 'ID' column
merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)

Explanation:

The merge() function is used to combine two DataFrames based on a common column (ID).
The how='inner' argument performs an inner join, which keeps only rows with matching ID values in both DataFrames.

Output: The DataFrames are merged on the ID column, and the result contains only rows that have matching ID values in both DataFrames.

ID Name Salary
0 1 Aman 50000
1 2 Bhoomi 60000

These operations are essential for performing common data manipulations and merging tasks in Pandas. By using these techniques, you can easily clean, transform, and combine data to prepare it for analysis and interpretation.

Take the next step in your career with Python and Data Science! Enroll in upGrad's Professional Certificate Program in Data Science and AI. Gain expertise in Python, Excel, SQL, GitHub, and Power BI through 110+ hours of live sessions!

Also Read: Top 70 Python Interview Questions & Answers: Ultimate Guide 2025

Let's now explore advanced techniques and see how they can help streamline data processing and reveal more meaningful insights from complex datasets.

How to Implement Advanced DataFrame Techniques in Python?

Advanced DataFrame techniques in Python enable complex data manipulation, transformation, and analysis, allowing you to handle large datasets and optimize performance. These methods are vital for effectively managing and analyzing large-scale data.

Below are some methods for tackling intricate data tasks and gaining deeper insights:

Popular Data Science Programs

PGD in Data Science MS in Data Science MSc in Data Science Program DevOps Course Online Post Graduate Certificate in Data Science

1. Pivot Tables

Pivot tables are used to summarize and aggregate data based on specific criteria. This is particularly helpful when you want to group data by one or more columns and calculate aggregated values, such as sum, mean, or count.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'City': ['New York', 'London', 'New York', 'London'],
    'Age': [24, 27, 22, 30]
})

# Creating a pivot table to calculate the mean of 'Age' by 'City'
pivot_table = df.pivot_table(values='Age', index='City', aggfunc='mean')
print(pivot_table)

Explanation:

The pivot_table() function is used to create a pivot table.
values='Age' specifies that the column 'Age' should be aggregated.
index='City' means that the pivot table will group the data by 'City'.
aggfunc='mean' specifies the aggregation function to calculate the mean age for each city.

Output: The pivot table groups the data by 'City' and calculates the average age for each city.

Age
City
London 28.500000
New York 23.000000

2. Reshaping DataFrames

Reshaping data with functions like melt() and pivot() is used when you need to transform the data between wide and long formats. These functions help make data easier to work with when applying aggregation or analysis.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'City': ['New York', 'London', 'New York', 'London'],
    'Age': [24, 27, 22, 30],
    'Salary': [50000, 60000, 70000, 80000]
})

# Reshaping data using melt
melted_df = df.melt(id_vars=['City'], value_vars=['Age', 'Salary'])
print(melted_df)

Explanation:

melt() is used to transform the DataFrame from wide format to long format.
id_vars=['City'] specifies that the 'City' column should remain as an identifier for the data.
value_vars=['Age', 'Salary'] indicates the columns to be unpivoted (i.e., turned into rows).

Output: The melt() function converts the 'Age' and 'Salary' columns into a single column of values, with each row now representing a different combination of 'City' and the corresponding values.

City variable value
0 New York Age 24
1 London Age 27
2 New York Age 22
3 London Age 30
4 New York Salary 50000
5 London Salary 60000
6 New York Salary 70000
7 London Salary 80000

3. Handling Duplicates

In many datasets, you may encounter duplicate rows. Removing duplicates is essential to ensure the quality of the data before performing any analysis.

Code Example:

import pandas as pd

# Create a DataFrame with duplicate rows
df = pd.DataFrame({
    'Name': ['Aman', 'Bhoomi', 'Aman', 'Chetan'],
    'Age': [24, 27, 24, 22]
})

# Drop duplicates based on all columns
df_unique = df.drop_duplicates()

print(df_unique)

Explanation:

The drop_duplicates() method is used to remove duplicate rows from a DataFrame.
By default, it removes rows that have identical values across all columns.

Output: The DataFrame now contains unique rows, removing the duplicate entry for 'Aman'.

Name Age
0 Aman 24
1 Bhoomi 27
3 Chetan 22

4. Applying Functions to DataFrames

You can apply custom functions to columns or rows in a DataFrame using the apply() method. This is particularly useful when you need to perform more complex operations or transformations on your data.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Aman', 'Bhoomi', 'Chetan'],
    'Age': [24, 27, 22]
})

# Apply a function to increase age by 5 years
df['Age'] = df['Age'].apply(lambda x: x + 5)

print(df)

Explanation: The apply() function is used to apply a lambda function to the Age column, increasing each value by 5.

Output: The Age values are updated by adding 5 to each value.

Name Age
0 Aman 29
1 Bhoomi 32
2 Chetan 27

5. DataFrame Aggregation with Multiple Functions

You can aggregate data using multiple functions simultaneously. This is helpful when you need to compute several statistics on your data, such as the mean, sum, and count, at once.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'City': ['New York', 'London', 'New York', 'London'],
    'Age': [24, 27, 22, 30],
    'Salary': [50000, 60000, 70000, 80000]
})

# Aggregate data using multiple functions
agg_df = df.groupby('City').agg({
    'Age': ['mean', 'max', 'min'],
    'Salary': ['sum', 'mean']
})

print(agg_df)

Explanation:

The agg() function is used to apply multiple aggregation functions on the Age and Salary columns.
The groupby() method groups the data by City.

Output: The output displays the aggregated statistics for Age (mean, max, min) and Salary (sum, mean) for each city.

Age Salary
mean max min sum mean
City
London 28.500000 30 27 140000 70000.0
New York 23.000000 24 22 120000 60000.0

These advanced techniques allow you to manipulate and reshape data for more detailed analysis and reporting. Pivot tables provide powerful aggregation, while reshaping methods, such as melt(), allow for easier handling of long-format data.

Also Read: Python Cheat Sheet: From Fundamentals to Advanced Concepts for 2025

upGrad’s Exclusive Data Science Webinar for you –

How upGrad helps for your Data Science Career?

Enhance Your Python Skills with upGrad!

A DataFrame in Python is a two-dimensional structure that organizes data into rows and columns. It’s widely used for managing and analyzing datasets, providing a simple and effective way to manipulate data. Yet, many individuals struggle with efficiently handling complex or large datasets due to the challenges of data cleaning and processing.

To address these challenges, upGrad offers programs designed to improve your proficiency in Python and data manipulation. These programs equip you with the tools needed to work confidently with data and refine your technical expertise.

Here are some additional upGrad courses to help enhance your coding skills:

Curious about which Python software development course best fits your goals in 2025? Contact upGrad for personalized counseling and valuable insights, or visit your nearest upGrad offline center for more details.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Top Data Science Skills to Learn

Data Analysis Course	Inferential Statistics Courses
Hypothesis Testing Programs	Logistic Regression Courses
Linear Regression Courses	Linear Algebra for Analysis

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Read our popular Data Science Articles

Data Science Career Path: A Comprehensive Career Guide	Data Science Career Growth: The Future of Work is here	Why is Data Science Important? 8 Ways Data Science Brings Value to the Business
Relevance of Data Science for Managers	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist

Reference:
https://content.techgig.com/technology/python-dominates-2025-programming-landscape-with-unprecedented-popularity/articleshow/121134781.cms