View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All

DataFrames in Python: Why Every Data Scientist Is Obsessed!

By Rohit Sharma

Updated on Jul 08, 2025 | 21 min read | 7.99K+ views

Share:

Did you know? Python’s popularity increased by 2.2% from April to May 2025, surpassing competitors like C++, C, and Java! This growing demand highlights the increasing reliance on Python’s powerful features, like DataFrames, making data analysis faster and more intuitive than ever.

DataFrames in Python are two-dimensional, size-mutable, and labeled data structures provided by the Pandas library. They store data in rows and columns, similar to tables in databases or spreadsheets, allowing efficient data manipulation and analysis.

DataFrames can hold various data types and support operations like filtering, grouping, and aggregating, making them indispensable in data science.

In this blog, you’ll learn about DataFrames in Python, focusing on their creation, manipulation, and advanced usage for practical analysis.

Interested in learning more about DataFrames in Python? Enrol in upGrad’s Online Software Development Courses, featuring an updated curriculum on generative AI and specializations like full-stack development.

DataFrames in Python: 6 Ways to Create Them with Pandas

Pandas DataFrame is a 2D labeled structure that can store data of different types (e.g., integers, floats, strings) across rows and columns. It is similar to a table in a database or an Excel spreadsheet, with labeled axes (rows and columns). The main components of a DataFrame are:

  • Rows: These represent individual data points or records.
  • Columns: These define the data categories, such as "name" or "age.
  • Data: The actual values in the DataFrame, which can include numbers, text, or other types.

Understanding concepts like DataFrames is just the beginning. To advance in Python and build a successful tech career, continuous learning is essential. Here are some relevant courses that can help you in your learning journey:

Understanding DataFrames is an essential step in working with data in Python. To start, let’s ensure that Pandas is installed:

pip install pandas

Once installed, import Pandas into your Python script:

import pandas as pd

Let’s now look at how to create DataFrames in Python using Pandas, using sources such as dictionaries, lists, or external files like CSV and Excel.

1. From a Dictionary

Creating a DataFrame from a dictionary maps the dictionary keys to column labels while the corresponding values fill the columns. This approach is beneficial when dealing with structured data, as it allows direct mapping of data attributes (keys) to columns.

Code Example:

import pandas as pd

# Create a DataFrame from a dictionary
data = {'Name': ['Aman', 'Bhoomi', 'Chetan'], 'Age': [24, 27, 22]}
df = pd.DataFrame(data)

print(df)

Explanation:

  • A dictionary is created where the keys ('Name', 'Age') represent the column labels, and the values (lists of names and ages) represent the data to be populated under each column.
  • pd.DataFrame(data) takes the dictionary and converts it into a DataFrame.

Output:

  • The Name and Age columns are populated based on the dictionary values, and each list entry corresponds to a row.
  • Row indices (0, 1, 2) are generated automatically by Pandas.

    Name     Age
0  Aman      24
1   Bhoomi   27
2  Chetan    22

Also Read: Python Challenges for Beginners

2. From a List of Lists

Using a list of lists allows you to create a DataFrame where each inner list represents a row. Since the list of lists does not contain column labels, the columns parameter must be specified separately to define the DataFrame's structure.

Code Example:

import pandas as pd

# Create a DataFrame from a list of lists
data = [['Aman', 24], ['Bhoomi', 27], ['Chetan', 22]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

print(df)

Explanation:

  • The data is structured as a list of lists, where each sublist represents a row in the DataFrame.
  • The columns parameter explicitly defines the labels for the columns, in this case, Name and Age.

Output:

  • The DataFrame organizes the Name and Age labels based on the inner lists provided.
  • Row indices (0, 1, 2) are automatically assigned by Pandas, which makes the rows identifiable and easily accessible.

       Name  Age
0   Aman     24
1   Bhoomi   27
2  Chetan    22

Get a better understanding of Python with upGrad’s Learn Python Libraries: NumPy, Matplotlib & Pandas. Learn how to manipulate data using NumPy, visualize insights with Matplotlib, and analyze datasets with Pandas.

Also Read: A Comprehensive Guide to Pandas DataFrame astype()

3. From External Files (CSV, Excel, etc.)

Pandas allows you to read data directly from external files, such as CSV or Excel, into a DataFrame. This is particularly useful when working with large datasets stored externally.

Code Example:

import pandas as pd

# Create a DataFrame from a CSV file
df = pd.read_csv('data.csv')

print(df)

Explanation:

  • pd.read_csv('data.csv') reads the contents of the specified CSV file (data.csv) and loads it into a Pandas DataFrame.
  • Ensure the CSV file (data.csv) is in the same directory as the script, or provide the full file path if the file is located elsewhere.

Output (Assuming the CSV contains columns Name, Age, Country):

  • The columns and rows are automatically populated from the content of the CSV file.
  • Row indices (0, 1, and 2) are generated automatically by Pandas based on the data.
  • The columns in the CSV file (e.g., Name, Age, Country) become the column labels in the DataFrame.

      Name  Age   Country
0   Aman     24    USA
1    Bhoomi  27    UK
2   Chetan   22   Canada

Note:

  • Excel Files: For Excel files, you would use pd.read_excel('data.xlsx') instead of pd.read_csv().
  • File Path: If the file is in a different directory, ensure that the full path to the file is provided, e.g., pd.read_csv('/path/to/your/data.csv').

Also Read: Career Opportunities in Python: Everything You Need To Know [2025]

4. From NumPy Arrays

You can create a DataFrame from a NumPy array, where each row in the array becomes a row in the DataFrame. Columns must be defined explicitly, and this method is beneficial when working with numerical data.

Code Example:

import pandas as pd
import numpy as np

# Create a DataFrame from a NumPy array
data = np.array([[1, 2], [3, 4], [5, 6]])
df = pd.DataFrame(data, columns=['Column1', 'Column2'])

print(df)

Explanation:

  • A NumPy array is created with shape (3, 2), which corresponds to three rows and two columns.
  • pd.DataFrame(data, columns=['Column1', 'Column2']) converts the NumPy array into a Pandas DataFrame.
  • The columns parameter specifies the column labels ('Column1', 'Column2'), while the NumPy array fills the rows.

Output:

  • The DataFrame is created using the NumPy array as the data source, with each row from the array becoming a row in the DataFrame.
  • Pandas automatically assign row indices (0, 1, 2).
  • The specified column names ('Column1' and 'Column2') are used as headers for the DataFrame columns.

 Column1  Column2
0        1        2
1        3        4
2        5        6

Also Read: Top 7 Data Types in Python: Examples, Differences, and Best Practices (2025)

5. From a List of Dictionaries

When data is structured as a list of dictionaries, each dictionary represents a row in the DataFrame, with the dictionary keys becoming the column labels.

Code Example:

import pandas as pd

# Create a DataFrame from a list of dictionaries
data = [{'Name': 'Aman', 'Age': 24}, {'Name': 'Bhoomi', 'Age': 27}]
df = pd.DataFrame(data)

print(df)

Explanation:

  • Each dictionary in the list represents a row in the DataFrame.
  • The keys in the dictionaries automatically become the column labels for the DataFrame.

Output:

  • Each dictionary is converted into a row, and the column labels are derived from the keys in the dictionaries.
  • Pandas assigns row indices (0, 1, 2) automatically.

Output:

      Name  Age
0    Aman    24
1    Bhoomi  27

Start your Python learning journey with upGrad’s Learn Basic Python Programming course! Build expertise in Python and Matplotlib through hands-on exercises. Ideal for beginners, plus earn a certification to advance your career upon completion!

Also Read: Inheritance in Python | Python Inheritance [With Example]

6. From a Pandas Series

You can create a DataFrame from a Pandas Series by passing the series into the DataFrame() constructor. The Series’ name attribute becomes the column label in the resulting DataFrame.

Code Example:

import pandas as pd

# Create a DataFrame from a Series
series = pd.Series([1, 2, 3], name='Numbers')
df = pd.DataFrame(series)

print(df)

Explanation:

  • A Pandas Series is passed directly into the DataFrame() constructor.
  • The name attribute of the Series (in this case, 'Numbers') becomes the column label for the DataFrame.

Output:

  • The Series is converted into a single-column DataFrame, with the name attribute used as the column label.
  • Pandas automatically generate row indices (0, 1, 2).

   Numbers
0        1
1         2
2        3

In fields like MLAI, and data analytics, DataFrames in Python play a crucial role in structuring data for model training and generating insights. With Python libraries like Pandas, data manipulation becomes efficient, simplifying tasks such as cleaning, transforming, and visualizing data.

Are you a full-stack developer wanting to integrate AI into your Python Coding? upGrad’s AI-Driven Full-Stack Development can help you. You’ll learn how to build AI-powered software using OpenAI, GitHub Copilot, Bolt AI & more.

Also Read: Top 36+ Python Projects for Beginners and Students to Explore in 2025

Let's explore key ways for inspecting and accessing data in pandas, enabling efficient exploration and extraction of insights from your DataFrames in Python.

DataFrame Basics: Viewing, Accessing, and Filtering Data

DataFrame Basics involves understanding how to view and access data within a Pandas DataFrame. This includes viewing the structure, selecting specific rows and columns, and filtering data to extract meaningful insights.

Below are the key techniques for viewing and accessing data in DataFrames:

1. Viewing Data

Once you load a DataFrame, inspecting the data is the first step. You can use several methods to view and understand the structure of the DataFrame.

(a) head(): Displays the first 5 rows by default, allowing you to get a quick look at the top of the data. You can specify a number to display a custom number of rows.

Code Example:

import pandas as pd

data = {'Name': ['Aman', 'Bhoomi', 'Chetan'], 'Age': [24, 27, 22]}
df = pd.DataFrame(data)

# Display the first five rows (default behavior)
print(df.head())

Explanation: df.head() displays the first five rows of the DataFrame. You can specify the number of rows to display by passing an argument (e.g., df.head(2) to show the first two rows).

Output: This will show the first five rows of the DataFrame. If there are fewer than five rows, all of them will be displayed.

      Name   Age
0   Aman     24
1   Bhoomi   27
2  Chetan    22

(b) tail(): Displays the last five rows of the DataFrame, offering a look at the bottom of the data.

Code Example:

print(df.tail())  # Displays the last 5 rows

Explanation: df.tail() returns the last five rows. This is useful for inspecting the bottom of your dataset.

Output: Since the DataFrame has only three rows, the output will display all of them.

Name  Age
0   Aman     24
1   Bhoomi   27
2  Chetan    22

(c) info(): Provides summary information about the DataFrame, including the number of non-null entries and the data type of each column.

Code Example:

df.info()

Explanation: The df.info() function provides a concise summary of the DataFrame, including the number of non-null entries and the data types of each column.

Output: This method helps you understand the structure of the DataFrame, including the number of missing values and the type of data in each column.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
#   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
0   Name    3 non-null      object
1   Age     3 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 143.0+ bytes

(d) describe(): Generates descriptive statistics for numerical columns, such as the mean, standard deviation, minimum, and maximum values.

Code Example:

print(df.describe())

Explanation: df.describe() returns statistical details like mean, standard deviation, and min/max values for numeric columns.

Output: It provides a quick summary of numeric data, including the count of non-null entries, the mean value, and the standard deviation.

        Age
count   3.0
mean  24.3
std      2.52
min     22.0
25%    23.0
50%    24.0
75%    25.5
max    27.0

2. Accessing Data

You can access data from a DataFrame using the column name or by index, making it easy to retrieve specific parts of the dataset for further analysis.

(a) By Column Name: You can access a column in a DataFrame directly by specifying its name in square brackets. This allows you to isolate specific features of the data.

Code Example:

print(df['Name'])  # Access the 'Name' column

Explanation: Accessing a DataFrame column by its name returns a Series with data from that column. This is useful when you need to work with specific variables in your dataset.

Output: This will return the data from the Name column.

0    Aman
1     Bhoomi
2    Chetan
Name: Name, dtype: object

(b) By Row Index (Using iloc and loc): You can access rows by their index position using iloc (integer location-based indexing) or loc (label-based indexing). This helps to isolate specific rows based on their position or label.

  • Using iloc: Access a specific row by its index (integer-based). This is particularly useful when working with rows based on their positional location in the DataFrame.

Code Example:

# Using iloc to access a specific row by index
print(df.iloc[0])  # Accesses the first row (index 0)

Explanation: iloc[0] accesses the first row (index 0) of the DataFrame, which is ideal when you need to retrieve data by row index.

Output: This function returns all data from the first row in a Series format.

Name    Aman
Age        24
Name: 0, dtype: object

  • Using loc: loc accesses a specific row by its index label. This is helpful when working with DataFrames that have custom index labels rather than the default integer-based index.

Code Example:

print(df.loc[0])  # Accesses the row with index label 0

Explanation: loc[0] accesses the row with the index label 0, which is the default integer index.

Output: Similar to iloc, but loc can also be used with custom index labels.

Name    Aman
Age        24
Name: 0, dtype: object

3. Filtering Data

Pandas allows you to filter data based on specific conditions, enabling you to isolate rows that meet particular criteria. This functionality is essential for analyzing subsets of your data.

Code Example:

# Filter rows where the age is greater than 30
filtered_df = df[df['Age'] > 30]
print(filtered_df)

Explanation:

  • df['Age'] > 30 returns a Boolean series, which is used to filter rows based on the condition where the age is greater than 30.
  • The resulting DataFrame will only contain rows where the condition is true.

Output: Since none of the rows in this DataFrame have an age greater than 30, the output will be an empty DataFrame.

Empty DataFrame
Columns: [Name, Age]
Index: []

The basic DataFrame operations are essential for efficient data exploration and analysis. Gaining proficiency in these techniques allows you to interact with large datasets and make informed decisions based on the data.

Ready to advance your Python skills? Gain expertise in Linux, Python foundation, AWS, Azure, and Google Cloud to create scalable solutions with upGrad’s Expert Cloud Engineer Bootcamp. Start building your job-ready portfolio today!

Also Read: Mastering Python Variables: Complete Guide with Examples

Let's explore how Pandas makes it easy to manipulate and merge data using a variety of built-in DataFrame operations.

DataFrame Operations: Data Manipulation and Merging

DataFrame operations in Pandas enable powerful data manipulation, including data transformation, merging, and aggregation. These operations help to clean, reshape, and combine datasets for deeper analysis and more insightful results.

Below are some common DataFrame operations that help efficiently manipulate and merge datasets for better analysis.

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

1. Adding and Dropping Columns

Adding or dropping columns is a common operation for modifying the structure of a DataFrame. This is useful when you need to either expand your dataset with new data or remove unnecessary columns for focused analysis.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Aman', 'Bhoomi', 'Chetan'],
    'Age': [24, 27, 22]
})

# Adding a new column
df['Salary'] = [50000, 60000, 70000]
print(df)

# Dropping the 'Salary' column
df = df.drop('Salary', axis=1)
print(df)

Explanation:

  • A new column is added by assigning a list or Series to the new column name (df['Salary']).
  • drop() removes a column from the DataFrame, specified by axis=1.

Output:

  • The first output shows the new Salary column.

       Name  Age  Salary
0     Aman    24   50000
1      Bhoomi   27   60000
2     Chetan   22   70000

  • The second output displays the DataFrame after dropping the 'Salary' column.

 Name  Age
0    Aman      24
1     Bhoomi   27
2    Chetan   22

2. Renaming Columns

Renaming columns allows you to update column names to be more descriptive or standardized. This operation is crucial when you are cleaning data or preparing it for analysis.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Aman', 'Bhoomi', 'Chetan'],
    'Age': [24, 27, 22]
})

# Renaming the column 'Name' to 'Full Name'
df = df.rename(columns={'Name': 'Full Name'})
print(df)

Explanation: The rename() function is used to change column names by passing a dictionary where the keys are the old names and the values are the new names.

Output: The Name column is renamed to Full Name, and the updated DataFrame is printed.

Full Name  Age
0    Aman      24
1     Bhoomi   27
2    Chetan    22

3. Sorting Data

Sorting data helps organize the DataFrame based on specific criteria. Sorting by one or more columns enables easier data analysis, particularly when searching for trends or arranging data in a logical order.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Aman', 'Bhoomi', 'Chetan'],
    'Age': [24, 27, 22]
})

# Sorting the DataFrame by the 'Age' column in descending order
df = df.sort_values(by='Age', ascending=False)
print(df)

Explanation: The sort_values() function sorts the DataFrame by a specified column, with the ascending=False parameter used to sort in descending order.

Output: The rows are sorted by the Age column in descending order.

Name  Age
1    Bhoomi   27
0   Aman     24
2  Chetan   22

4. Handling Missing Data

Handling missing data is essential to ensure data integrity and avoid errors during analysis. Pandas provides multiple methods, such as fillna() and dropna(), to manage missing values effectively.

Code Example:

import pandas as pd

# Create a DataFrame with missing values
df = pd.DataFrame({
    'Name': ['Aman', 'Bhoomi', None],
    'Age': [24, None, 22]
})

# Identify missing values
print(df.isnull())

# Fill missing values in the 'Age' column with the mean
df['Age'] = df['Age'].fillna(df['Age'].mean())
print(df)

# Drop rows with any missing values
df = df.dropna()
print(df)

Explanation:

  • isnull() identifies missing data.
  • fillna() is used to fill missing data with a specific value, like the mean of the Age column.
  • dropna() removes rows that contain missing values.

Output:

  • The first output indicates where the missing values are located.

Name    Age
0  False  False
1   False  True
2  True   False

  • The second output fills the missing Age value with the mean of the column.

Name  Age
0    Aman     24
1    Bhoomi   25.5
2    Chetan   22

  • The third output drops rows with any missing values.

      Name   Age
0   Aman  24.0
2  Chetan  22.0

5. Grouping Data

Grouping data is valuable when you want to aggregate information based on a specific feature. The groupby() method allows you to group data by a column and apply aggregation functions such as mean(), sum(), or count().

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'City': ['New York', 'London', 'New York', 'London'],
    'Age': [24, 27, 22, 30]
})

# Group by 'City' and calculate the mean of 'Age'
grouped_df = df.groupby('City').mean()
print(grouped_df)

Explanation: The groupby() function groups the data by the City column, and mean() function calculates the average value for the Age column in each group.

Output: The DataFrame is grouped by City, and the mean age for each city is calculated.

           Age
City             
London   28.500000
New York 23.000000

6. Merging DataFrames

Merging DataFrames is a common operation when you need to combine two datasets based on a common column or index. This operation is similar to SQL joins and is used to combine related data from different sources.

Code Example:

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Aman', 'Bhoomi', 'Chetan']
})

df2 = pd.DataFrame({
    'ID': [1, 2, 4],
    'Salary': [50000, 60000, 70000]
})

# Merge the DataFrames on the 'ID' column
merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)

Explanation:

  • The merge() function is used to combine two DataFrames based on a common column (ID).
  • The how='inner' argument performs an inner join, which keeps only rows with matching ID values in both DataFrames.

Output: The DataFrames are merged on the ID column, and the result contains only rows that have matching ID values in both DataFrames.

     ID   Name    Salary
0   1    Aman      50000
1   2    Bhoomi   60000

These operations are essential for performing common data manipulations and merging tasks in Pandas. By using these techniques, you can easily clean, transform, and combine data to prepare it for analysis and interpretation.

Take the next step in your career with Python and Data Science! Enroll in upGrad's Professional Certificate Program in Data Science and AI. Gain expertise in Python, Excel, SQL, GitHub, and Power BI through 110+ hours of live sessions!

Also Read: Top 70 Python Interview Questions & Answers: Ultimate Guide 2025

Let's now explore advanced techniques and see how they can help streamline data processing and reveal more meaningful insights from complex datasets.

How to Implement Advanced DataFrame Techniques in Python?

Advanced DataFrame techniques in Python enable complex data manipulation, transformation, and analysis, allowing you to handle large datasets and optimize performance. These methods are vital for effectively managing and analyzing large-scale data.

Below are some methods for tackling intricate data tasks and gaining deeper insights:

1. Pivot Tables

Pivot tables are used to summarize and aggregate data based on specific criteria. This is particularly helpful when you want to group data by one or more columns and calculate aggregated values, such as sum, mean, or count.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'City': ['New York', 'London', 'New York', 'London'],
    'Age': [24, 27, 22, 30]
})

# Creating a pivot table to calculate the mean of 'Age' by 'City'
pivot_table = df.pivot_table(values='Age', index='City', aggfunc='mean')
print(pivot_table)

Explanation:

  • The pivot_table() function is used to create a pivot table.
  • values='Age' specifies that the column 'Age' should be aggregated.
  • index='City' means that the pivot table will group the data by 'City'.
  • aggfunc='mean' specifies the aggregation function to calculate the mean age for each city.

Output: The pivot table groups the data by 'City' and calculates the average age for each city.

           Age
City             
London     28.500000
New York  23.000000

2. Reshaping DataFrames

Reshaping data with functions like melt() and pivot() is used when you need to transform the data between wide and long formats. These functions help make data easier to work with when applying aggregation or analysis.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'City': ['New York', 'London', 'New York', 'London'],
    'Age': [24, 27, 22, 30],
    'Salary': [50000, 60000, 70000, 80000]
})

# Reshaping data using melt
melted_df = df.melt(id_vars=['City'], value_vars=['Age', 'Salary'])
print(melted_df)

Explanation:

  • melt() is used to transform the DataFrame from wide format to long format.
  • id_vars=['City'] specifies that the 'City' column should remain as an identifier for the data.
  • value_vars=['Age', 'Salary'] indicates the columns to be unpivoted (i.e., turned into rows).

Output: The melt() function converts the 'Age' and 'Salary' columns into a single column of values, with each row now representing a different combination of 'City' and the corresponding values.

    City variable            value
0  New York     Age    24
1   London       Age     27
2  New York    Age    22
3   London      Age     30
4  New York   Salary   50000
5   London     Salary    60000
6  New York   Salary   70000
7   London     Salary    80000

3. Handling Duplicates

In many datasets, you may encounter duplicate rows. Removing duplicates is essential to ensure the quality of the data before performing any analysis.

Code Example:

import pandas as pd

# Create a DataFrame with duplicate rows
df = pd.DataFrame({
    'Name': ['Aman', 'Bhoomi', 'Aman', 'Chetan'],
    'Age': [24, 27, 24, 22]
})

# Drop duplicates based on all columns
df_unique = df.drop_duplicates()

print(df_unique)

Explanation:

  • The drop_duplicates() method is used to remove duplicate rows from a DataFrame.
  • By default, it removes rows that have identical values across all columns.

Output: The DataFrame now contains unique rows, removing the duplicate entry for 'Aman'.

Name  Age
0  Aman     24
1   Bhoomi   27
3  Chetan   22

4. Applying Functions to DataFrames

You can apply custom functions to columns or rows in a DataFrame using the apply() method. This is particularly useful when you need to perform more complex operations or transformations on your data.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Aman', 'Bhoomi', 'Chetan'],
    'Age': [24, 27, 22]
})

# Apply a function to increase age by 5 years
df['Age'] = df['Age'].apply(lambda x: x + 5)

print(df)

Explanation: The apply() function is used to apply a lambda function to the Age column, increasing each value by 5.

Output: The Age values are updated by adding 5 to each value.

Name  Age
0  Aman     29
1  Bhoomi   32
2  Chetan   27

5. DataFrame Aggregation with Multiple Functions

You can aggregate data using multiple functions simultaneously. This is helpful when you need to compute several statistics on your data, such as the mean, sum, and count, at once.

Code Example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'City': ['New York', 'London', 'New York', 'London'],
    'Age': [24, 27, 22, 30],
    'Salary': [50000, 60000, 70000, 80000]
})

# Aggregate data using multiple functions
agg_df = df.groupby('City').agg({
    'Age': ['mean', 'max', 'min'],
    'Salary': ['sum', 'mean']
})

print(agg_df)

Explanation:

  • The agg() function is used to apply multiple aggregation functions on the Age and Salary columns.
  • The groupby() method groups the data by City.

Output: The output displays the aggregated statistics for Age (mean, max, min) and Salary (sum, mean) for each city.

                       Age                   Salary     
                mean max min      sum    mean
City                                     
London     28.500000  30  27  140000  70000.0
New York 23.000000  24  22  120000  60000.0

These advanced techniques allow you to manipulate and reshape data for more detailed analysis and reporting. Pivot tables provide powerful aggregation, while reshaping methods, such as melt(), allow for easier handling of long-format data.

Also Read: Python Cheat Sheet: From Fundamentals to Advanced Concepts for 2025

upGrad’s Exclusive Data Science Webinar for you –

How upGrad helps for your Data Science Career?

 

Enhance Your Python Skills with upGrad!

A DataFrame in Python is a two-dimensional structure that organizes data into rows and columns. It’s widely used for managing and analyzing datasets, providing a simple and effective way to manipulate data. Yet, many individuals struggle with efficiently handling complex or large datasets due to the challenges of data cleaning and processing.

To address these challenges, upGrad offers programs designed to improve your proficiency in Python and data manipulation. These programs equip you with the tools needed to work confidently with data and refine your technical expertise.

Here are some additional upGrad courses to help enhance your coding skills:

Curious about which Python software development course best fits your goals in 2025? Contact upGrad for personalized counseling and valuable insights, or visit your nearest upGrad offline center for more details.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Reference:
https://content.techgig.com/technology/python-dominates-2025-programming-landscape-with-unprecedented-popularity/articleshow/121134781.cms

Frequently Asked Questions (FAQs)

1. How do I concatenate two DataFrames in Python?

2. How do I reset the index of a DataFrame in Python?

3. Can DataFrames in Python handle missing values?

4. How do I filter rows based on multiple conditions in a DataFrame in Python?

5. How do I rename columns in a DataFrame in Python?

6. Can DataFrames in Python handle large datasets efficiently?

7. How do I calculate summary statistics for a DataFrame in Python?

8. How do I apply a function to a DataFrame in Python?

9. How do I merge two DataFrames in Python?

10. Can I group data in a DataFrame in Python?

11. How do I save a DataFrame in Python to an Excel file?

Rohit Sharma

763 articles published

Rohit Sharma shares insights, skill building advice, and practical tips tailored for professionals aiming to achieve their career goals.

Get Free Consultation

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

Placement Assistance

Executive PG Program

12 Months