- Sign In
Python Pandas Tutorial: Everything Beginners Need to Know about Python Pandas
By Rohit Sharma
Updated on Oct 05, 2025 | 9 min read | 8.08K+ views
Share:
Table of Contents
View all
Python Pandas is one of the most popular libraries for data analysis in Python. If you want to work with datasets, clean and manipulate data, or perform statistical analysis, learning Pandas is essential. This Python Pandas tutorial is designed specifically for beginners who want a practical and easy-to-follow guide.
In this blog, you will learn how to install and set up Pandas, understand core data structures like Series and DataFrames, perform essential data operations, handle missing values, create pivot tables, and visualize data effectively. By the end of this tutorial, you’ll be able to manage real-world datasets confidently and start building your data analysis skills using Pandas.
Want to secure a high-paying career in data science? Enroll in upGrad’s industry-aligned Data Science Courses to advance your career in 2025!
Popular Data Science Programs
Introduction to Python Pandas
Python Pandas is a library in Python that makes working with structured data easy and efficient. Whether you are analyzing small datasets or handling large volumes of data, Pandas provides tools to read, manipulate, and summarize information quickly. If you are starting your journey in data analysis, mastering Pandas will help you handle data like a pro.
At its core, Pandas provides two main data structures: Series and DataFrame. These structures allow you to store and manipulate data in a way that is both intuitive and powerful.
Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:
- Masters in Data Science from Liverpool John Moores University
- Generative AI Mastery Certificate for Data Analysis
- Executive Diploma in Data Science and AI from IIITB
Key Features of Pandas:
- Easy data handling: Load data from CSV, Excel, JSON, or SQL databases with just a few lines of code.
- Flexible data structures: Work with Series (1D) and DataFrames (2D) for various types of data.
- Data cleaning: Handle missing values, duplicates, and inconsistent data efficiently.
- Data analysis: Compute statistics, aggregate data, and summarize datasets quickly.
- Visualization: Plot data directly from DataFrames using built-in functions.
Why Pandas is Essential for Beginners:
- Allows you to focus on analysis rather than low-level data handling.
- Makes Python a strong tool for data science and machine learning workflows.
- Integrates smoothly with other Python libraries like NumPy, Matplotlib, and Seaborn.
Here’s a simple comparison to understand Pandas vs basic Python data handling:
Feature |
Basic Python |
Pandas |
| Data structure | List, Dictionary | Series, DataFrame |
| Reading CSV | Manual parsing | pd.read_csv() |
| Filtering data | Loops & conditions | .loc[], .iloc[] |
| Aggregation | Custom code | .groupby(), .agg() |
| Handling missing values | Manual checks | .isna(), .fillna() |
Pandas also comes with functions that make complex operations simple. For example, sorting, grouping, merging, and pivoting data can be done in just one or two lines of code. This reduces errors and saves time, especially when dealing with large datasets.
To summarize, Pandas is your go-to library for data manipulation and analysis in Python. By learning Pandas, you’ll gain the ability to:
- Work efficiently with datasets of any size.
- Prepare data for analysis or visualization.
- Perform statistical operations with minimal code.
- Integrate seamlessly with other Python tools for advanced analytics.
Next, you will learn how to install and set up Pandas on your system so you can start working with real datasets immediately.
Also Read: Data Analysis Using Python [Everything You Need to Know]
Installing and Setting Up Pandas in Python
Before you start working with datasets, you need to install and set up Pandas in Python. This section of the Python Pandas tutorial will guide you through the process step by step so you can start analyzing data without issues.
Installing Pandas in Python
You can install Pandas using Python’s package manager, pip. It is recommended to have Python installed on your system before proceeding.
Steps to install Pandas:
- Open your terminal or command prompt.
- Run the following command:
pip install pandas
- Verify the installation with:
import pandas as pd
print(pd.__version__)
This confirms that Pandas in Python is ready to use.
Also Read: How to Install Python in Windows (Even If You're a Beginner!)
Setting Up Pandas in Your Project
Once installed, you can import Pandas in any Python script or Jupyter Notebook. Using a standard alias makes your code cleaner:
import pandas as pd
Common Setup Tips for Beginners:
- Make sure your Python environment is up-to-date.
- Use virtual environments (venv or conda) to manage dependencies.
- Install optional libraries like NumPy and Matplotlib alongside Pandas for enhanced functionality.
Here’s a simple overview of the setup process:
Step |
Command/Action |
Notes |
| Install Pandas | pip install pandas | Works for most Python environments |
| Verify installation | import pandas as pd + pd.__version__ | Confirms correct installation |
| Optional setup | pip install numpy matplotlib | Enhances data analysis and visualization |
You may also Read: Python PIP
Using Pandas in Different Environments
Pandas works in multiple Python environments. Here’s how to set it up depending on your choice:
- Jupyter Notebook: Install with pip install notebook and run jupyter notebook. Import Pandas using import pandas as pd.
- VS Code or PyCharm: Install Pandas in your project environment. Use the same import statement.
- Google Colab: Pandas is pre-installed, so you can directly import and start coding.
By following these steps, you ensure that Pandas in Python is correctly installed and ready for data analysis. You can now move on to learning about Pandas data structures and start working with real datasets.
Also Read: Data Visualisation: The What, The Why, and The How!
Data Science Courses to upskill
Explore Data Science Courses for Career Progression
Understanding Pandas Data Structures
Understanding Pandas data structures is essential to working efficiently with datasets in Python. In this Python Pandas tutorial, we will cover the two primary structures; Series and DataFrame, and explain how they are used in real-world data analysis. By mastering these, you’ll be able to manipulate and analyze data effectively.
Series in Pandas
A Series is a one-dimensional labeled array that can hold any data type, such as integers, strings, or floats. It is similar to a column in a spreadsheet or database table. Series are the building blocks of Pandas and are easy to create and manipulate.
Also Read: Understanding Python Data Types
Key points about Series:
- Each element has a label (index), which allows for quick data access.
- Can be created from lists, dictionaries, or NumPy arrays.
- Supports vectorized operations, making calculations faster.
Example of a Series:
import pandas as pd
data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)
Index |
Value |
| 0 | 10 |
| 1 | 20 |
| 2 | 30 |
| 3 | 40 |
DataFrame in Pandas
A DataFrame is a two-dimensional labeled data structure, like a table with rows and columns. It is one of the most used structures in Pandas in Python and allows you to work with complex datasets easily.
Key points about DataFrames:
- Can be created from dictionaries, lists of lists, or CSV/Excel files.
- Columns can have different data types.
- Offers powerful indexing and selection tools for rows and columns.
Example of a DataFrame:
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['Delhi', 'Mumbai', 'Bangalore']
}
df = pd.DataFrame(data)
print(df)
Name |
Age |
City |
| Alice | 25 | Delhi |
| Bob | 30 | Mumbai |
| Charlie | 35 | Bangalore |
Why Understanding Data Structures Matters
- Knowing whether to use a Series or DataFrame helps in efficient data manipulation.
- Makes operations like filtering, grouping, and aggregating easier.
- Supports integration with other Python libraries for data analysis and visualization.
By understanding Series and DataFrames, you lay a strong foundation for the rest of this Pandas in Python tutorial. In the next sections, you’ll learn how to perform basic operations, handle missing data, and manipulate datasets effectively.
Also Read: A Comprehensive Guide to Pandas DataFrame astype()
Basic Operations in Pandas
Working with data in Pandas involves performing basic operations such as reading, writing, selecting, and modifying datasets. In this Python Pandas tutorial, you will learn these essential operations that form the foundation for any data analysis task. Understanding these basics allows you to handle data efficiently and prepare it for deeper analysis.
Reading and Writing Data
Pandas provides simple methods to load data from different sources and save your work.
Common operations include:
Reading CSV files:
import pandas as pd
df = pd.read_csv('data.csv')
Reading Excel files:
df = pd.read_excel('data.xlsx')
Writing DataFrames to files:
df.to_csv('output.csv', index=False)
Operation |
Function |
Notes |
| Read CSV | pd.read_csv() | Most common for tabular data |
| Read Excel | pd.read_excel() | Requires openpyxl library |
| Save to CSV | df.to_csv() | Index optional |
Selecting and Filtering Data
Once your data is loaded, you need to select or filter specific rows or columns.
Key points:
- Select columns: df['ColumnName'] or df[['Col1', 'Col2']]
- Select rows by index: df.iloc[0:5]
- Filter data with conditions: df[df['Age'] > 25]
Example of filtering:
Name |
Age |
City |
| Bob | 30 | Mumbai |
| Charlie | 35 | Bangalore |
Adding, Updating, and Deleting Data
Pandas lets you modify your dataset easily.
- Add a new column: df['Salary'] = [50000, 60000, 70000]
- Update values: df.loc[df['Name']=='Alice', 'Age'] = 26
- Drop rows or columns: df.drop('Salary', axis=1, inplace=True)
These operations are crucial in preparing your data before performing more advanced analysis.
By mastering these basic operations, you build a strong foundation in Pandas and can handle most beginner-level datasets confidently. The next sections of this Python Pandas tutorial will cover handling missing data, merging datasets, and performing aggregation operations to extract meaningful insights.
Also Read: Data Science for Beginners: Prerequisites, Learning Path, Career Opportunities and More
Handling Missing Data in Pandas
Handling missing data is a critical step in any data analysis workflow. In this Python Pandas tutorial, you will learn how to detect, remove, and fill missing values in your datasets. Properly handling missing data ensures that your analysis is accurate and avoids errors in calculations or visualizations.
Detecting Missing Data
Before you can handle missing values, you need to identify them. Pandas provides simple methods to detect missing data in your Series or DataFrames.
Key points:
- Use isna() or isnull() to find missing values.
- notna() or notnull() identifies valid data.
- Summarize missing data using .sum() to see the total count per column.
Example:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, None, 35, None],
'City': ['Delhi', 'Mumbai', None, 'Bangalore']}
df = pd.DataFrame(data)
print(df.isna())
print(df.isna().sum())
Column |
Missing Values |
| Name | 0 |
| Age | 2 |
| City | 1 |
Removing Missing Data
Sometimes, dropping rows or columns with missing data is the easiest solution.
Common methods:
- Drop rows with any missing values: df.dropna()
- Drop columns with missing values: df.dropna(axis=1)
- Drop only if all values are missing: df.dropna(how='all')
Filling Missing Data
In other cases, filling missing data is better than removing it. Pandas provides flexible options:
- Fill with a constant value: df.fillna(0)
- Fill with the mean, median, or mode of the column:
df['Age'].fillna(df['Age'].mean(), inplace=True)
- Forward-fill or backward-fill: df.fillna(method='ffill') or df.fillna(method='bfill')
Method |
Example |
Description |
| Drop rows | df.dropna() | Removes rows with missing values |
| Drop columns | df.dropna(axis=1) | Removes columns with missing values |
| Fill constant | df.fillna(0) | Replaces missing values with 0 |
| Fill mean/median | df['Age'].fillna(df['Age'].mean()) | Replaces with average value |
| Forward/Backward | df.fillna(method='ffill') | Propagates previous/next value |
Handling missing data effectively helps you maintain dataset integrity and ensures that further operations, like filtering or aggregation, produce correct results. Learning these techniques in this Pandas in Python tutorial sets the stage for more advanced data analysis and manipulation tasks.
Also Read: Big Data Tutorial for Beginners: All You Need to Know
Data Manipulation in Pandas
Data manipulation is one of the most important skills when working with datasets in Python. In this Python Pandas tutorial, you will learn how to organize, transform, and combine data to make it more useful for analysis. Pandas provides simple methods to sort, merge, group, and pivot your data efficiently.
Sorting and Ranking Data
Sorting data helps you organize your dataset based on column values or index. Ranking allows you to assign ranks to data points.
Key operations include:
- Sort by column: df.sort_values(by='Age')
- Sort by index: df.sort_index()
- Rank values: df['Score'].rank()
Example table after sorting by Age:
Name |
Age |
City |
| Alice | 25 | Delhi |
| Bob | 30 | Mumbai |
| Charlie | 35 | Bangalore |
Also Read: How to Use Sort in Python: Methods, Parameters, and Examples
Merging, Joining, and Concatenating DataFrames
Combining datasets is common in real-world analysis. Pandas makes it easy with merge, join, and concat functions.
Common methods:
- Merge DataFrames: pd.merge(df1, df2, on='ID')
- Join DataFrames: df1.join(df2)
- Concatenate DataFrames: pd.concat([df1, df2])
Method |
Use Case |
Example |
| Merge | Combine tables on common column | pd.merge(df1, df2, on='ID') |
| Join | Add columns from another DataFrame | df1.join(df2) |
| Concatenate | Stack DataFrames vertically/horiz. | pd.concat([df1, df2]) |
Grouping and Aggregation
Grouping allows you to summarize data based on categories. Aggregation functions like sum, mean, or count help extract insights quickly.
Example usage:
grouped = df.groupby('City')['Age'].mean()
print(grouped)
City |
Average Age |
| Delhi | 25 |
| Mumbai | 30 |
| Bangalore | 35 |
By learning these data manipulation techniques in this Python Pandas tutorial, you can clean, organize, and prepare your datasets efficiently. Mastering sorting, merging, and grouping allows you to focus on analysis and insights rather than repetitive manual work.
Also Read: 30 Data Science Project Ideas for Beginners in 2025
Conclusion
And now, we have reached the end of this Python Pandas tutorial. We hope you found it useful and informative. Python Pandas is a vast topic, and with the numerous functions it has, it would take some time for one to get familiar with it completely.
If you’re interested in learning more about Python, its various libraries, including Pandas, and its application in data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.
Read our popular Data Science Articles
Top Data Science Skills to Learn to upskill
| SL. No | Top Data Science Skills to Learn | |
1 |
Data Analysis Online Courses | Inferential Statistics Online Courses |
2 |
Hypothesis Testing Online Courses | Logistic Regression Online Courses |
3 |
Linear Regression Courses | Linear Algebra for Analysis Online Courses |
Explore our Popular Data Science Courses
Frequently Asked Questions (FAQs)
1. What is Pandas in Python?
Pandas in Python is a library used for data manipulation and analysis. It provides data structures like Series and DataFrames to efficiently work with structured datasets, making data cleaning and processing easier for beginners.
2. Why should I learn Pandas in Python?
Learning Pandas in Python helps you organize, filter, and analyze data quickly. It simplifies common tasks such as merging datasets, handling missing values, and performing basic statistics, making it easier to start practical data analysis projects.
3. How do I install Pandas in Python?
You can install Pandas using pip install pandas in your terminal or command prompt. After installation, import it with import pandas as pd to begin loading and analyzing datasets efficiently in Python.
4. What are the main data structures in Pandas?
Pandas primarily uses Series (one-dimensional) and DataFrames (two-dimensional) for data storage. Series is like a single column, while DataFrame is a table of rows and columns. Both are essential for handling structured datasets in Python.
5. How do I read a CSV file in Pandas?
Use pd.read_csv('filename.csv') to load data into a DataFrame. Pandas also supports Excel, JSON, and SQL files, making it easy to start exploring and analyzing datasets without complex code.
6. How can I view the first few rows of a DataFrame?
Use df.head() to display the first five rows or df.tail() for the last five. This allows you to quickly inspect your dataset and understand its structure before performing further analysis.
7. How do I handle missing data in Pandas?
Missing values can be removed with df.dropna() or filled with df.fillna(). You can use mean, median, or forward/backward fill techniques to handle gaps, ensuring your dataset remains complete for analysis.
8. How do I filter data in Pandas?
Filtering is done using conditions like df[df['Age'] > 25]. You can combine multiple conditions to extract specific subsets of your dataset efficiently for further analysis.
9. How do I select specific columns?
Select a single column using df['ColumnName'] or multiple columns with df[['Col1', 'Col2']]. This helps focus on the relevant data needed for analysis or visualization.
10. How can I add a new column to a DataFrame?
Add a column using df['NewColumn'] = values. Pandas allows dynamic updates to your dataset, which is useful for calculated fields or storing results from operations applied to other columns.
11. What is the difference between Series and DataFrame?
Series is one-dimensional, suitable for single columns, while DataFrame is two-dimensional with multiple columns. Knowing this difference is important to structure your data correctly for analysis and operations.
12. How do I sort data in Pandas?
Sort a DataFrame by column values with df.sort_values(by='ColumnName') or by index with df.sort_index(). Sorting helps organize your data for clearer analysis or reporting.
13. How do I merge two DataFrames?
Use pd.merge(df1, df2, on='KeyColumn') to combine datasets on a common column. Merging is useful for integrating different sources of information for analysis.
14. How do I group data in Pandas?
Grouping is done with df.groupby('ColumnName') to aggregate values, like computing sums, averages, or counts. This helps summarize data for easier interpretation.
15. How can I compute basic statistics in Pandas?
Pandas provides functions like df.mean(), df.sum(), and df.describe() to calculate descriptive statistics, allowing you to quickly understand your dataset.
16. Can I visualize data using Pandas?
Yes, you can plot data directly using df.plot() for line charts, bar charts, and histograms. For more advanced plots, Pandas works well with libraries like Matplotlib and Seaborn.
17. How do I remove a column or row in Pandas?
Use df.drop('ColumnName', axis=1) to remove a column or df.drop(index) to remove specific rows. This is helpful for cleaning unnecessary data from your dataset.
18. How can I apply a function to a column?
Use df['Col'].apply(function) to apply a custom or built-in function to each value in a column. This is useful for transformations like formatting, calculations, or text processing.
19. Is Pandas suitable for large datasets?
Pandas can handle reasonably large datasets efficiently. Techniques like chunking or using optimized data types help manage memory, making Pandas suitable for practical data analysis tasks.
20. How do I save a DataFrame to a CSV file?
Use df.to_csv('output.csv', index=False) to export your DataFrame. This preserves your cleaned and processed dataset for sharing or further use in analysis.
877 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources