Python Pandas Tutorial: Everything Beginners Need to Know about Python Pandas
By Rohit Sharma
Updated on Oct 05, 2025 | 9 min read | 7.17K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Oct 05, 2025 | 9 min read | 7.17K+ views
Share:
Table of Contents
Python Pandas is one of the most popular libraries for data analysis in Python. If you want to work with datasets, clean and manipulate data, or perform statistical analysis, learning Pandas is essential. This Python Pandas tutorial is designed specifically for beginners who want a practical and easy-to-follow guide.
In this blog, you will learn how to install and set up Pandas, understand core data structures like Series and DataFrames, perform essential data operations, handle missing values, create pivot tables, and visualize data effectively. By the end of this tutorial, you’ll be able to manage real-world datasets confidently and start building your data analysis skills using Pandas.
Want to secure a high-paying career in data science? Enroll in upGrad’s industry-aligned Data Science Courses to advance your career in 2025!
Popular Data Science Programs
Python Pandas is a library in Python that makes working with structured data easy and efficient. Whether you are analyzing small datasets or handling large volumes of data, Pandas provides tools to read, manipulate, and summarize information quickly. If you are starting your journey in data analysis, mastering Pandas will help you handle data like a pro.
At its core, Pandas provides two main data structures: Series and DataFrame. These structures allow you to store and manipulate data in a way that is both intuitive and powerful.
Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:
Here’s a simple comparison to understand Pandas vs basic Python data handling:
Feature |
Basic Python |
Pandas |
Data structure | List, Dictionary | Series, DataFrame |
Reading CSV | Manual parsing | pd.read_csv() |
Filtering data | Loops & conditions | .loc[], .iloc[] |
Aggregation | Custom code | .groupby(), .agg() |
Handling missing values | Manual checks | .isna(), .fillna() |
Pandas also comes with functions that make complex operations simple. For example, sorting, grouping, merging, and pivoting data can be done in just one or two lines of code. This reduces errors and saves time, especially when dealing with large datasets.
To summarize, Pandas is your go-to library for data manipulation and analysis in Python. By learning Pandas, you’ll gain the ability to:
Next, you will learn how to install and set up Pandas on your system so you can start working with real datasets immediately.
Also Read: Data Analysis Using Python [Everything You Need to Know]
Before you start working with datasets, you need to install and set up Pandas in Python. This section of the Python Pandas tutorial will guide you through the process step by step so you can start analyzing data without issues.
You can install Pandas using Python’s package manager, pip. It is recommended to have Python installed on your system before proceeding.
Steps to install Pandas:
pip install pandas
import pandas as pd
print(pd.__version__)
This confirms that Pandas in Python is ready to use.
Also Read: How to Install Python in Windows (Even If You're a Beginner!)
Once installed, you can import Pandas in any Python script or Jupyter Notebook. Using a standard alias makes your code cleaner:
import pandas as pd
Common Setup Tips for Beginners:
Here’s a simple overview of the setup process:
Step |
Command/Action |
Notes |
Install Pandas | pip install pandas | Works for most Python environments |
Verify installation | import pandas as pd + pd.__version__ | Confirms correct installation |
Optional setup | pip install numpy matplotlib | Enhances data analysis and visualization |
You may also Read: Python PIP
Pandas works in multiple Python environments. Here’s how to set it up depending on your choice:
By following these steps, you ensure that Pandas in Python is correctly installed and ready for data analysis. You can now move on to learning about Pandas data structures and start working with real datasets.
Also Read: Data Visualisation: The What, The Why, and The How!
Data Science Courses to upskill
Explore Data Science Courses for Career Progression
Understanding Pandas data structures is essential to working efficiently with datasets in Python. In this Python Pandas tutorial, we will cover the two primary structures; Series and DataFrame, and explain how they are used in real-world data analysis. By mastering these, you’ll be able to manipulate and analyze data effectively.
A Series is a one-dimensional labeled array that can hold any data type, such as integers, strings, or floats. It is similar to a column in a spreadsheet or database table. Series are the building blocks of Pandas and are easy to create and manipulate.
Also Read: Understanding Python Data Types
Key points about Series:
Example of a Series:
import pandas as pd
data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)
Index |
Value |
0 | 10 |
1 | 20 |
2 | 30 |
3 | 40 |
A DataFrame is a two-dimensional labeled data structure, like a table with rows and columns. It is one of the most used structures in Pandas in Python and allows you to work with complex datasets easily.
Key points about DataFrames:
Example of a DataFrame:
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['Delhi', 'Mumbai', 'Bangalore']
}
df = pd.DataFrame(data)
print(df)
Name |
Age |
City |
Alice | 25 | Delhi |
Bob | 30 | Mumbai |
Charlie | 35 | Bangalore |
By understanding Series and DataFrames, you lay a strong foundation for the rest of this Pandas in Python tutorial. In the next sections, you’ll learn how to perform basic operations, handle missing data, and manipulate datasets effectively.
Also Read: A Comprehensive Guide to Pandas DataFrame astype()
Working with data in Pandas involves performing basic operations such as reading, writing, selecting, and modifying datasets. In this Python Pandas tutorial, you will learn these essential operations that form the foundation for any data analysis task. Understanding these basics allows you to handle data efficiently and prepare it for deeper analysis.
Pandas provides simple methods to load data from different sources and save your work.
Common operations include:
Reading CSV files:
import pandas as pd
df = pd.read_csv('data.csv')
Reading Excel files:
df = pd.read_excel('data.xlsx')
Writing DataFrames to files:
df.to_csv('output.csv', index=False)
Operation |
Function |
Notes |
Read CSV | pd.read_csv() | Most common for tabular data |
Read Excel | pd.read_excel() | Requires openpyxl library |
Save to CSV | df.to_csv() | Index optional |
Once your data is loaded, you need to select or filter specific rows or columns.
Key points:
Example of filtering:
Name |
Age |
City |
Bob | 30 | Mumbai |
Charlie | 35 | Bangalore |
Pandas lets you modify your dataset easily.
These operations are crucial in preparing your data before performing more advanced analysis.
By mastering these basic operations, you build a strong foundation in Pandas and can handle most beginner-level datasets confidently. The next sections of this Python Pandas tutorial will cover handling missing data, merging datasets, and performing aggregation operations to extract meaningful insights.
Also Read: Data Science for Beginners: Prerequisites, Learning Path, Career Opportunities and More
Handling missing data is a critical step in any data analysis workflow. In this Python Pandas tutorial, you will learn how to detect, remove, and fill missing values in your datasets. Properly handling missing data ensures that your analysis is accurate and avoids errors in calculations or visualizations.
Before you can handle missing values, you need to identify them. Pandas provides simple methods to detect missing data in your Series or DataFrames.
Key points:
Example:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, None, 35, None],
'City': ['Delhi', 'Mumbai', None, 'Bangalore']}
df = pd.DataFrame(data)
print(df.isna())
print(df.isna().sum())
Column |
Missing Values |
Name | 0 |
Age | 2 |
City | 1 |
Sometimes, dropping rows or columns with missing data is the easiest solution.
Common methods:
In other cases, filling missing data is better than removing it. Pandas provides flexible options:
df['Age'].fillna(df['Age'].mean(), inplace=True)
Method |
Example |
Description |
Drop rows | df.dropna() | Removes rows with missing values |
Drop columns | df.dropna(axis=1) | Removes columns with missing values |
Fill constant | df.fillna(0) | Replaces missing values with 0 |
Fill mean/median | df['Age'].fillna(df['Age'].mean()) | Replaces with average value |
Forward/Backward | df.fillna(method='ffill') | Propagates previous/next value |
Handling missing data effectively helps you maintain dataset integrity and ensures that further operations, like filtering or aggregation, produce correct results. Learning these techniques in this Pandas in Python tutorial sets the stage for more advanced data analysis and manipulation tasks.
Also Read: Big Data Tutorial for Beginners: All You Need to Know
Data manipulation is one of the most important skills when working with datasets in Python. In this Python Pandas tutorial, you will learn how to organize, transform, and combine data to make it more useful for analysis. Pandas provides simple methods to sort, merge, group, and pivot your data efficiently.
Sorting data helps you organize your dataset based on column values or index. Ranking allows you to assign ranks to data points.
Key operations include:
Example table after sorting by Age:
Name |
Age |
City |
Alice | 25 | Delhi |
Bob | 30 | Mumbai |
Charlie | 35 | Bangalore |
Also Read: How to Use Sort in Python: Methods, Parameters, and Examples
Combining datasets is common in real-world analysis. Pandas makes it easy with merge, join, and concat functions.
Common methods:
Method |
Use Case |
Example |
Merge | Combine tables on common column | pd.merge(df1, df2, on='ID') |
Join | Add columns from another DataFrame | df1.join(df2) |
Concatenate | Stack DataFrames vertically/horiz. | pd.concat([df1, df2]) |
Grouping allows you to summarize data based on categories. Aggregation functions like sum, mean, or count help extract insights quickly.
Example usage:
grouped = df.groupby('City')['Age'].mean()
print(grouped)
City |
Average Age |
Delhi | 25 |
Mumbai | 30 |
Bangalore | 35 |
By learning these data manipulation techniques in this Python Pandas tutorial, you can clean, organize, and prepare your datasets efficiently. Mastering sorting, merging, and grouping allows you to focus on analysis and insights rather than repetitive manual work.
Also Read: 30 Data Science Project Ideas for Beginners in 2025
And now, we have reached the end of this Python Pandas tutorial. We hope you found it useful and informative. Python Pandas is a vast topic, and with the numerous functions it has, it would take some time for one to get familiar with it completely.
If you’re interested in learning more about Python, its various libraries, including Pandas, and its application in data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.
SL. No | Top Data Science Skills to Learn | |
1 |
Data Analysis Online Courses | Inferential Statistics Online Courses |
2 |
Hypothesis Testing Online Courses | Logistic Regression Online Courses |
3 |
Linear Regression Courses | Linear Algebra for Analysis Online Courses |
Pandas in Python is a library used for data manipulation and analysis. It provides data structures like Series and DataFrames to efficiently work with structured datasets, making data cleaning and processing easier for beginners.
Learning Pandas in Python helps you organize, filter, and analyze data quickly. It simplifies common tasks such as merging datasets, handling missing values, and performing basic statistics, making it easier to start practical data analysis projects.
You can install Pandas using pip install pandas in your terminal or command prompt. After installation, import it with import pandas as pd to begin loading and analyzing datasets efficiently in Python.
Pandas primarily uses Series (one-dimensional) and DataFrames (two-dimensional) for data storage. Series is like a single column, while DataFrame is a table of rows and columns. Both are essential for handling structured datasets in Python.
Use pd.read_csv('filename.csv') to load data into a DataFrame. Pandas also supports Excel, JSON, and SQL files, making it easy to start exploring and analyzing datasets without complex code.
Use df.head() to display the first five rows or df.tail() for the last five. This allows you to quickly inspect your dataset and understand its structure before performing further analysis.
Missing values can be removed with df.dropna() or filled with df.fillna(). You can use mean, median, or forward/backward fill techniques to handle gaps, ensuring your dataset remains complete for analysis.
Filtering is done using conditions like df[df['Age'] > 25]. You can combine multiple conditions to extract specific subsets of your dataset efficiently for further analysis.
Select a single column using df['ColumnName'] or multiple columns with df[['Col1', 'Col2']]. This helps focus on the relevant data needed for analysis or visualization.
Add a column using df['NewColumn'] = values. Pandas allows dynamic updates to your dataset, which is useful for calculated fields or storing results from operations applied to other columns.
Series is one-dimensional, suitable for single columns, while DataFrame is two-dimensional with multiple columns. Knowing this difference is important to structure your data correctly for analysis and operations.
Sort a DataFrame by column values with df.sort_values(by='ColumnName') or by index with df.sort_index(). Sorting helps organize your data for clearer analysis or reporting.
Use pd.merge(df1, df2, on='KeyColumn') to combine datasets on a common column. Merging is useful for integrating different sources of information for analysis.
Grouping is done with df.groupby('ColumnName') to aggregate values, like computing sums, averages, or counts. This helps summarize data for easier interpretation.
Pandas provides functions like df.mean(), df.sum(), and df.describe() to calculate descriptive statistics, allowing you to quickly understand your dataset.
Yes, you can plot data directly using df.plot() for line charts, bar charts, and histograms. For more advanced plots, Pandas works well with libraries like Matplotlib and Seaborn.
Use df.drop('ColumnName', axis=1) to remove a column or df.drop(index) to remove specific rows. This is helpful for cleaning unnecessary data from your dataset.
Use df['Col'].apply(function) to apply a custom or built-in function to each value in a column. This is useful for transformations like formatting, calculations, or text processing.
Pandas can handle reasonably large datasets efficiently. Techniques like chunking or using optimized data types help manage memory, making Pandas suitable for practical data analysis tasks.
Use df.to_csv('output.csv', index=False) to export your DataFrame. This preserves your cleaned and processed dataset for sharing or further use in analysis.
834 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources