Home
Blog
Data Science
Python Pandas Tutorial: Everything Beginners Need to Know about Python Pandas

Python Pandas Tutorial: Everything Beginners Need to Know about Python Pandas

Q: 4. What are the main data structures in Pandas?

Pandas primarily uses Series (one-dimensional) and DataFrames (two-dimensional) for data storage. Series is like a single column, while DataFrame is a table of rows and columns. Both are essential for handling structured datasets in Python.

By Rohit Sharma

Updated on Oct 05, 2025 | 9 min read | 7.53K+ views

Table of Contents

View all

Introduction to Python Pandas
Installing and Setting Up Pandas in Python
Understanding Pandas Data Structures
Basic Operations in Pandas
Handling Missing Data in Pandas
Data Manipulation in Pandas
Conclusion

Python Pandas is one of the most popular libraries for data analysis in Python. If you want to work with datasets, clean and manipulate data, or perform statistical analysis, learning Pandas is essential. This Python Pandas tutorial is designed specifically for beginners who want a practical and easy-to-follow guide.

In this blog, you will learn how to install and set up Pandas, understand core data structures like Series and DataFrames, perform essential data operations, handle missing values, create pivot tables, and visualize data effectively. By the end of this tutorial, you’ll be able to manage real-world datasets confidently and start building your data analysis skills using Pandas.

Want to secure a high-paying career in data science? Enroll in upGrad’s industry-aligned Data Science Courses to advance your career in 2025!

Popular Data Science Programs

Post Graduate Certificate in Data Science M Sc in Data Science Degree Cloud Computing Courses Certification Postgraduate Diploma in Data Science Data Science Machine Learning Course

Introduction to Python Pandas

Python Pandas is a library in Python that makes working with structured data easy and efficient. Whether you are analyzing small datasets or handling large volumes of data, Pandas provides tools to read, manipulate, and summarize information quickly. If you are starting your journey in data analysis, mastering Pandas will help you handle data like a pro.

At its core, Pandas provides two main data structures: Series and DataFrame. These structures allow you to store and manipulate data in a way that is both intuitive and powerful.

Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:

Key Features of Pandas:

Easy data handling: Load data from CSV, Excel, JSON, or SQL databases with just a few lines of code.
Flexible data structures: Work with Series (1D) and DataFrames (2D) for various types of data.
Data cleaning: Handle missing values, duplicates, and inconsistent data efficiently.
Data analysis: Compute statistics, aggregate data, and summarize datasets quickly.
Visualization: Plot data directly from DataFrames using built-in functions.

Why Pandas is Essential for Beginners:

Allows you to focus on analysis rather than low-level data handling.
Makes Python a strong tool for data science and machine learning workflows.
Integrates smoothly with other Python libraries like NumPy, Matplotlib, and Seaborn.

Here’s a simple comparison to understand Pandas vs basic Python data handling:

Feature	Basic Python	Pandas
Data structure	List, Dictionary	Series, DataFrame
Reading CSV	Manual parsing	pd.read_csv()
Filtering data	Loops & conditions	.loc[], .iloc[]
Aggregation	Custom code	.groupby(), .agg()
Handling missing values	Manual checks	.isna(), .fillna()

Pandas also comes with functions that make complex operations simple. For example, sorting, grouping, merging, and pivoting data can be done in just one or two lines of code. This reduces errors and saves time, especially when dealing with large datasets.

To summarize, Pandas is your go-to library for data manipulation and analysis in Python. By learning Pandas, you’ll gain the ability to:

Work efficiently with datasets of any size.
Prepare data for analysis or visualization.
Perform statistical operations with minimal code.
Integrate seamlessly with other Python tools for advanced analytics.

Next, you will learn how to install and set up Pandas on your system so you can start working with real datasets immediately.

Also Read: Data Analysis Using Python [Everything You Need to Know]

Installing and Setting Up Pandas in Python

Before you start working with datasets, you need to install and set up Pandas in Python. This section of the Python Pandas tutorial will guide you through the process step by step so you can start analyzing data without issues.

Installing Pandas in Python

You can install Pandas using Python’s package manager, pip. It is recommended to have Python installed on your system before proceeding.

Steps to install Pandas:

Open your terminal or command prompt.
Run the following command:

pip install pandas

Verify the installation with:

import pandas as pd 
print(pd.__version__)

This confirms that Pandas in Python is ready to use.

Also Read: How to Install Python in Windows (Even If You're a Beginner!)

Setting Up Pandas in Your Project

Once installed, you can import Pandas in any Python script or Jupyter Notebook. Using a standard alias makes your code cleaner:

import pandas as pd

Common Setup Tips for Beginners:

Make sure your Python environment is up-to-date.
Use virtual environments (venv or conda) to manage dependencies.
Install optional libraries like NumPy and Matplotlib alongside Pandas for enhanced functionality.

Here’s a simple overview of the setup process:

Step	Command/Action	Notes
Install Pandas	pip install pandas	Works for most Python environments
Verify installation	import pandas as pd + pd.__version__	Confirms correct installation
Optional setup	pip install numpy matplotlib	Enhances data analysis and visualization

You may also Read: Python PIP

Using Pandas in Different Environments

Pandas works in multiple Python environments. Here’s how to set it up depending on your choice:

Jupyter Notebook: Install with pip install notebook and run jupyter notebook. Import Pandas using import pandas as pd.
VS Code or PyCharm: Install Pandas in your project environment. Use the same import statement.
Google Colab: Pandas is pre-installed, so you can directly import and start coding.

By following these steps, you ensure that Pandas in Python is correctly installed and ready for data analysis. You can now move on to learning about Pandas data structures and start working with real datasets.

Also Read: Data Visualisation: The What, The Why, and The How!

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree18 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

Understanding Pandas Data Structures

Understanding Pandas data structures is essential to working efficiently with datasets in Python. In this Python Pandas tutorial, we will cover the two primary structures; Series and DataFrame, and explain how they are used in real-world data analysis. By mastering these, you’ll be able to manipulate and analyze data effectively.

Series in Pandas

A Series is a one-dimensional labeled array that can hold any data type, such as integers, strings, or floats. It is similar to a column in a spreadsheet or database table. Series are the building blocks of Pandas and are easy to create and manipulate.

Also Read: Understanding Python Data Types

Key points about Series:

Each element has a label (index), which allows for quick data access.
Can be created from lists, dictionaries, or NumPy arrays.
Supports vectorized operations, making calculations faster.

Example of a Series:

import pandas as pd 
 
data = [10, 20, 30, 40] 
series = pd.Series(data) 
print(series)

Index	Value
0	10
1	20
2	30
3	40

DataFrame in Pandas

A DataFrame is a two-dimensional labeled data structure, like a table with rows and columns. It is one of the most used structures in Pandas in Python and allows you to work with complex datasets easily.

Key points about DataFrames:

Can be created from dictionaries, lists of lists, or CSV/Excel files.
Columns can have different data types.
Offers powerful indexing and selection tools for rows and columns.

Example of a DataFrame:

data = { 
   'Name': ['Alice', 'Bob', 'Charlie'], 
   'Age': [25, 30, 35], 
   'City': ['Delhi', 'Mumbai', 'Bangalore'] 
} 
df = pd.DataFrame(data) 
print(df)

Name	Age	City
Alice	25	Delhi
Bob	30	Mumbai
Charlie	35	Bangalore

Why Understanding Data Structures Matters

Knowing whether to use a Series or DataFrame helps in efficient data manipulation.
Makes operations like filtering, grouping, and aggregating easier.
Supports integration with other Python libraries for data analysis and visualization.

By understanding Series and DataFrames, you lay a strong foundation for the rest of this Pandas in Python tutorial. In the next sections, you’ll learn how to perform basic operations, handle missing data, and manipulate datasets effectively.

Also Read: A Comprehensive Guide to Pandas DataFrame astype()

Basic Operations in Pandas

Working with data in Pandas involves performing basic operations such as reading, writing, selecting, and modifying datasets. In this Python Pandas tutorial, you will learn these essential operations that form the foundation for any data analysis task. Understanding these basics allows you to handle data efficiently and prepare it for deeper analysis.

Reading and Writing Data

Pandas provides simple methods to load data from different sources and save your work.

Common operations include:

Reading CSV files:

import pandas as pd 
df = pd.read_csv('data.csv')

Reading Excel files:

df = pd.read_excel('data.xlsx')

Writing DataFrames to files:

df.to_csv('output.csv', index=False)

Operation	Function	Notes
Read CSV	pd.read_csv()	Most common for tabular data
Read Excel	pd.read_excel()	Requires openpyxl library
Save to CSV	df.to_csv()	Index optional

Selecting and Filtering Data

Once your data is loaded, you need to select or filter specific rows or columns.

Key points:

Select columns: df['ColumnName'] or df[['Col1', 'Col2']]
Select rows by index: df.iloc[0:5]
Filter data with conditions: df[df['Age'] > 25]

Example of filtering:

Name	Age	City
Bob	30	Mumbai
Charlie	35	Bangalore

Adding, Updating, and Deleting Data

Pandas lets you modify your dataset easily.

Add a new column: df['Salary'] = [50000, 60000, 70000]
Update values: df.loc[df['Name']=='Alice', 'Age'] = 26
Drop rows or columns: df.drop('Salary', axis=1, inplace=True)

These operations are crucial in preparing your data before performing more advanced analysis.

By mastering these basic operations, you build a strong foundation in Pandas and can handle most beginner-level datasets confidently. The next sections of this Python Pandas tutorial will cover handling missing data, merging datasets, and performing aggregation operations to extract meaningful insights.

Also Read: Data Science for Beginners: Prerequisites, Learning Path, Career Opportunities and More

Handling Missing Data in Pandas

Handling missing data is a critical step in any data analysis workflow. In this Python Pandas tutorial, you will learn how to detect, remove, and fill missing values in your datasets. Properly handling missing data ensures that your analysis is accurate and avoids errors in calculations or visualizations.

Detecting Missing Data

Before you can handle missing values, you need to identify them. Pandas provides simple methods to detect missing data in your Series or DataFrames.

Key points:

Use isna() or isnull() to find missing values.
notna() or notnull() identifies valid data.
Summarize missing data using .sum() to see the total count per column.

Example:

import pandas as pd 
 
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 
       'Age': [25, None, 35, None], 
       'City': ['Delhi', 'Mumbai', None, 'Bangalore']} 
 
df = pd.DataFrame(data) 
print(df.isna()) 
print(df.isna().sum())

Column	Missing Values
Name	0
Age	2
City	1

Removing Missing Data

Sometimes, dropping rows or columns with missing data is the easiest solution.

Common methods:

Drop rows with any missing values: df.dropna()
Drop columns with missing values: df.dropna(axis=1)
Drop only if all values are missing: df.dropna(how='all')

Filling Missing Data

In other cases, filling missing data is better than removing it. Pandas provides flexible options:

Fill with a constant value: df.fillna(0)
Fill with the mean, median, or mode of the column:

df['Age'].fillna(df['Age'].mean(), inplace=True)

Forward-fill or backward-fill: df.fillna(method='ffill') or df.fillna(method='bfill')

Method	Example	Description
Drop rows	df.dropna()	Removes rows with missing values
Drop columns	df.dropna(axis=1)	Removes columns with missing values
Fill constant	df.fillna(0)	Replaces missing values with 0
Fill mean/median	df['Age'].fillna(df['Age'].mean())	Replaces with average value
Forward/Backward	df.fillna(method='ffill')	Propagates previous/next value

Handling missing data effectively helps you maintain dataset integrity and ensures that further operations, like filtering or aggregation, produce correct results. Learning these techniques in this Pandas in Python tutorial sets the stage for more advanced data analysis and manipulation tasks.

Also Read: Big Data Tutorial for Beginners: All You Need to Know

Data Manipulation in Pandas

Data manipulation is one of the most important skills when working with datasets in Python. In this Python Pandas tutorial, you will learn how to organize, transform, and combine data to make it more useful for analysis. Pandas provides simple methods to sort, merge, group, and pivot your data efficiently.

Sorting and Ranking Data

Sorting data helps you organize your dataset based on column values or index. Ranking allows you to assign ranks to data points.

Key operations include:

Sort by column: df.sort_values(by='Age')
Sort by index: df.sort_index()
Rank values: df['Score'].rank()

Example table after sorting by Age:

Name	Age	City
Alice	25	Delhi
Bob	30	Mumbai
Charlie	35	Bangalore

Also Read: How to Use Sort in Python: Methods, Parameters, and Examples

Merging, Joining, and Concatenating DataFrames

Combining datasets is common in real-world analysis. Pandas makes it easy with merge, join, and concat functions.

Common methods:

Merge DataFrames: pd.merge(df1, df2, on='ID')
Join DataFrames: df1.join(df2)
Concatenate DataFrames: pd.concat([df1, df2])

Method	Use Case	Example
Merge	Combine tables on common column	pd.merge(df1, df2, on='ID')
Join	Add columns from another DataFrame	df1.join(df2)
Concatenate	Stack DataFrames vertically/horiz.	pd.concat([df1, df2])

Grouping and Aggregation

Grouping allows you to summarize data based on categories. Aggregation functions like sum, mean, or count help extract insights quickly.

Example usage:

grouped = df.groupby('City')['Age'].mean() 
print(grouped)

City	Average Age
Delhi	25
Mumbai	30
Bangalore	35

By learning these data manipulation techniques in this Python Pandas tutorial, you can clean, organize, and prepare your datasets efficiently. Mastering sorting, merging, and grouping allows you to focus on analysis and insights rather than repetitive manual work.

Also Read: 30 Data Science Project Ideas for Beginners in 2025

Conclusion

And now, we have reached the end of this Python Pandas tutorial. We hope you found it useful and informative. Python Pandas is a vast topic, and with the numerous functions it has, it would take some time for one to get familiar with it completely.

If you’re interested in learning more about Python, its various libraries, including Pandas, and its application in data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.

Read our popular Data Science Articles

Is Data Science Hard to Learn	Data Science Career Growth	What Is Data Science? Courses, Basics, Frameworks & Careers
Future of Data Science in India	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist
Career in Data Science	Data Science Top 10 Careers in 2025	Business Intelligence vs Data Science: What are the differences?

Top Data Science Skills to Learn to upskill

SL. No	Top Data Science Skills to Learn
1	Data Analysis Online Courses	Inferential Statistics Online Courses
2	Hypothesis Testing Online Courses	Logistic Regression Online Courses
3	Linear Regression Courses	Linear Algebra for Analysis Online Courses

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Frequently Asked Questions (FAQs)

1. What is Pandas in Python?

Pandas in Python is a library used for data manipulation and analysis. It provides data structures like Series and DataFrames to efficiently work with structured datasets, making data cleaning and processing easier for beginners.

2. Why should I learn Pandas in Python?

Learning Pandas in Python helps you organize, filter, and analyze data quickly. It simplifies common tasks such as merging datasets, handling missing values, and performing basic statistics, making it easier to start practical data analysis projects.

3. How do I install Pandas in Python?

You can install Pandas using pip install pandas in your terminal or command prompt. After installation, import it with import pandas as pd to begin loading and analyzing datasets efficiently in Python.

4. What are the main data structures in Pandas?

Pandas primarily uses Series (one-dimensional) and DataFrames (two-dimensional) for data storage. Series is like a single column, while DataFrame is a table of rows and columns. Both are essential for handling structured datasets in Python.

5. How do I read a CSV file in Pandas?

Use pd.read_csv('filename.csv') to load data into a DataFrame. Pandas also supports Excel, JSON, and SQL files, making it easy to start exploring and analyzing datasets without complex code.

6. How can I view the first few rows of a DataFrame?

Use df.head() to display the first five rows or df.tail() for the last five. This allows you to quickly inspect your dataset and understand its structure before performing further analysis.

7. How do I handle missing data in Pandas?

Missing values can be removed with df.dropna() or filled with df.fillna(). You can use mean, median, or forward/backward fill techniques to handle gaps, ensuring your dataset remains complete for analysis.

8. How do I filter data in Pandas?

Filtering is done using conditions like df[df['Age'] > 25]. You can combine multiple conditions to extract specific subsets of your dataset efficiently for further analysis.

9. How do I select specific columns?

Select a single column using df['ColumnName'] or multiple columns with df[['Col1', 'Col2']]. This helps focus on the relevant data needed for analysis or visualization.

10. How can I add a new column to a DataFrame?

Add a column using df['NewColumn'] = values. Pandas allows dynamic updates to your dataset, which is useful for calculated fields or storing results from operations applied to other columns.

11. What is the difference between Series and DataFrame?

Series is one-dimensional, suitable for single columns, while DataFrame is two-dimensional with multiple columns. Knowing this difference is important to structure your data correctly for analysis and operations.

12. How do I sort data in Pandas?

Sort a DataFrame by column values with df.sort_values(by='ColumnName') or by index with df.sort_index(). Sorting helps organize your data for clearer analysis or reporting.

13. How do I merge two DataFrames?

Use pd.merge(df1, df2, on='KeyColumn') to combine datasets on a common column. Merging is useful for integrating different sources of information for analysis.

14. How do I group data in Pandas?

Grouping is done with df.groupby('ColumnName') to aggregate values, like computing sums, averages, or counts. This helps summarize data for easier interpretation.

15. How can I compute basic statistics in Pandas?

Pandas provides functions like df.mean(), df.sum(), and df.describe() to calculate descriptive statistics, allowing you to quickly understand your dataset.

16. Can I visualize data using Pandas?

Yes, you can plot data directly using df.plot() for line charts, bar charts, and histograms. For more advanced plots, Pandas works well with libraries like Matplotlib and Seaborn.

17. How do I remove a column or row in Pandas?

Use df.drop('ColumnName', axis=1) to remove a column or df.drop(index) to remove specific rows. This is helpful for cleaning unnecessary data from your dataset.

18. How can I apply a function to a column?

Use df['Col'].apply(function) to apply a custom or built-in function to each value in a column. This is useful for transformations like formatting, calculations, or text processing.

19. Is Pandas suitable for large datasets?

Pandas can handle reasonably large datasets efficiently. Techniques like chunking or using optimized data types help manage memory, making Pandas suitable for practical data analysis tasks.

20. How do I save a DataFrame to a CSV file?

Use df.to_csv('output.csv', index=False) to export your DataFrame. This preserves your cleaned and processed dataset for sharing or further use in analysis.

Rohit Sharma

847 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources