Python Pandas Tutorial: Everything Beginners Need to Know about Python Pandas

By Rohit Sharma

Updated on Oct 05, 2025 | 9 min read | 7.17K+ views

Share:

Python Pandas is one of the most popular libraries for data analysis in Python. If you want to work with datasets, clean and manipulate data, or perform statistical analysis, learning Pandas is essential. This Python Pandas tutorial is designed specifically for beginners who want a practical and easy-to-follow guide. 

In this blog, you will learn how to install and set up Pandas, understand core data structures like Series and DataFrames, perform essential data operations, handle missing values, create pivot tables, and visualize data effectively. By the end of this tutorial, you’ll be able to manage real-world datasets confidently and start building your data analysis skills using Pandas. 

Want to secure a high-paying career in data science? Enroll in upGrad’s industry-aligned Data Science Courses to advance your career in 2025!  

Introduction to Python Pandas 

Python Pandas is a library in Python that makes working with structured data easy and efficient. Whether you are analyzing small datasets or handling large volumes of data, Pandas provides tools to read, manipulate, and summarize information quickly. If you are starting your journey in data analysis, mastering Pandas will help you handle data like a pro. 

At its core, Pandas provides two main data structures: Series and DataFrame. These structures allow you to store and manipulate data in a way that is both intuitive and powerful. 

Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:    

Key Features of Pandas: 

  • Easy data handling: Load data from CSV, Excel, JSON, or SQL databases with just a few lines of code. 
  • Flexible data structures: Work with Series (1D) and DataFrames (2D) for various types of data. 
  • Data cleaning: Handle missing values, duplicates, and inconsistent data efficiently. 
  • Data analysis: Compute statistics, aggregate data, and summarize datasets quickly. 
  • Visualization: Plot data directly from DataFrames using built-in functions. 

Why Pandas is Essential for Beginners: 

  • Allows you to focus on analysis rather than low-level data handling. 
  • Makes Python a strong tool for data science and machine learning workflows. 
  • Integrates smoothly with other Python libraries like NumPy, Matplotlib, and Seaborn

Here’s a simple comparison to understand Pandas vs basic Python data handling: 

Feature 

Basic Python 

Pandas 

Data structure  List, Dictionary  Series, DataFrame 
Reading CSV  Manual parsing  pd.read_csv() 
Filtering data  Loops & conditions  .loc[], .iloc[] 
Aggregation  Custom code  .groupby(), .agg() 
Handling missing values  Manual checks  .isna(), .fillna() 

Pandas also comes with functions that make complex operations simple. For example, sorting, grouping, merging, and pivoting data can be done in just one or two lines of code. This reduces errors and saves time, especially when dealing with large datasets. 

To summarize, Pandas is your go-to library for data manipulation and analysis in Python. By learning Pandas, you’ll gain the ability to: 

  • Work efficiently with datasets of any size. 
  • Prepare data for analysis or visualization. 
  • Perform statistical operations with minimal code. 
  • Integrate seamlessly with other Python tools for advanced analytics. 

Next, you will learn how to install and set up Pandas on your system so you can start working with real datasets immediately. 

Also Read: Data Analysis Using Python [Everything You Need to Know] 

Installing and Setting Up Pandas in Python 

Before you start working with datasets, you need to install and set up Pandas in Python. This section of the Python Pandas tutorial will guide you through the process step by step so you can start analyzing data without issues. 

Installing Pandas in Python 

You can install Pandas using Python’s package manager, pip. It is recommended to have Python installed on your system before proceeding. 

Steps to install Pandas: 

  • Open your terminal or command prompt. 
  • Run the following command: 
pip install pandas 
 
  • Verify the installation with: 
import pandas as pd 
print(pd.__version__) 
 

This confirms that Pandas in Python is ready to use. 

Also Read: How to Install Python in Windows (Even If You're a Beginner!) 

Setting Up Pandas in Your Project 

Once installed, you can import Pandas in any Python script or Jupyter Notebook. Using a standard alias makes your code cleaner: 

import pandas as pd 
 

Common Setup Tips for Beginners: 

  • Make sure your Python environment is up-to-date. 
  • Use virtual environments (venv or conda) to manage dependencies. 
  • Install optional libraries like NumPy and Matplotlib alongside Pandas for enhanced functionality. 

Here’s a simple overview of the setup process: 

Step 

Command/Action 

Notes 

Install Pandas  pip install pandas  Works for most Python environments 
Verify installation  import pandas as pd + pd.__version__  Confirms correct installation 
Optional setup  pip install numpy matplotlib  Enhances data analysis and visualization 

You may also Read: Python PIP 

Using Pandas in Different Environments 

Pandas works in multiple Python environments. Here’s how to set it up depending on your choice: 

  • Jupyter Notebook: Install with pip install notebook and run jupyter notebook. Import Pandas using import pandas as pd. 
  • VS Code or PyCharm: Install Pandas in your project environment. Use the same import statement. 
  • Google Colab: Pandas is pre-installed, so you can directly import and start coding. 

By following these steps, you ensure that Pandas in Python is correctly installed and ready for data analysis. You can now move on to learning about Pandas data structures and start working with real datasets. 

Also Read: Data Visualisation: The What, The Why, and The How! 

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

background

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Understanding Pandas Data Structures 

Understanding Pandas data structures is essential to working efficiently with datasets in Python. In this Python Pandas tutorial, we will cover the two primary structures; Series and DataFrame, and explain how they are used in real-world data analysis. By mastering these, you’ll be able to manipulate and analyze data effectively. 

Series in Pandas 

A Series is a one-dimensional labeled array that can hold any data type, such as integers, strings, or floats. It is similar to a column in a spreadsheet or database table. Series are the building blocks of Pandas and are easy to create and manipulate. 

Also Read: Understanding Python Data Types 

Key points about Series: 

  • Each element has a label (index), which allows for quick data access. 
  • Can be created from lists, dictionaries, or NumPy arrays
  • Supports vectorized operations, making calculations faster. 

Example of a Series: 

import pandas as pd 
 
data = [10, 20, 30, 40] 
series = pd.Series(data) 
print(series) 
 

Index 

Value 

10 
20 
30 
40 

DataFrame in Pandas 

A DataFrame is a two-dimensional labeled data structure, like a table with rows and columns. It is one of the most used structures in Pandas in Python and allows you to work with complex datasets easily. 

Key points about DataFrames: 

  • Can be created from dictionaries, lists of lists, or CSV/Excel files. 
  • Columns can have different data types. 
  • Offers powerful indexing and selection tools for rows and columns. 

Example of a DataFrame: 

data = { 
   'Name': ['Alice', 'Bob', 'Charlie'], 
   'Age': [25, 30, 35], 
   'City': ['Delhi', 'Mumbai', 'Bangalore'] 
} 
df = pd.DataFrame(data) 
print(df) 
 

Name 

Age 

City 

Alice  25  Delhi 
Bob  30  Mumbai 
Charlie  35  Bangalore 

Why Understanding Data Structures Matters 

  • Knowing whether to use a Series or DataFrame helps in efficient data manipulation. 
  • Makes operations like filtering, grouping, and aggregating easier. 
  • Supports integration with other Python libraries for data analysis and visualization. 

By understanding Series and DataFrames, you lay a strong foundation for the rest of this Pandas in Python tutorial. In the next sections, you’ll learn how to perform basic operations, handle missing data, and manipulate datasets effectively. 

Also Read: A Comprehensive Guide to Pandas DataFrame astype() 

Basic Operations in Pandas 

Working with data in Pandas involves performing basic operations such as reading, writing, selecting, and modifying datasets. In this Python Pandas tutorial, you will learn these essential operations that form the foundation for any data analysis task. Understanding these basics allows you to handle data efficiently and prepare it for deeper analysis. 

Reading and Writing Data 

Pandas provides simple methods to load data from different sources and save your work. 

Common operations include: 

Reading CSV files: 

import pandas as pd 
df = pd.read_csv('data.csv') 
 

Reading Excel files: 

df = pd.read_excel('data.xlsx') 
 

Writing DataFrames to files: 

df.to_csv('output.csv', index=False) 
 

Operation 

Function 

Notes 

Read CSV  pd.read_csv()  Most common for tabular data 
Read Excel  pd.read_excel()  Requires openpyxl library 
Save to CSV  df.to_csv()  Index optional 

Selecting and Filtering Data 

Once your data is loaded, you need to select or filter specific rows or columns. 

Key points: 

  • Select columns: df['ColumnName'] or df[['Col1', 'Col2']] 
  • Select rows by index: df.iloc[0:5] 
  • Filter data with conditions: df[df['Age'] > 25] 

Example of filtering: 

Name 

Age 

City 

Bob  30  Mumbai 
Charlie  35  Bangalore 

Adding, Updating, and Deleting Data 

Pandas lets you modify your dataset easily. 

  • Add a new column: df['Salary'] = [50000, 60000, 70000] 
  • Update values: df.loc[df['Name']=='Alice', 'Age'] = 26 
  • Drop rows or columns: df.drop('Salary', axis=1, inplace=True) 

These operations are crucial in preparing your data before performing more advanced analysis. 

By mastering these basic operations, you build a strong foundation in Pandas and can handle most beginner-level datasets confidently. The next sections of this Python Pandas tutorial will cover handling missing data, merging datasets, and performing aggregation operations to extract meaningful insights. 

Also Read: Data Science for Beginners: Prerequisites, Learning Path, Career Opportunities and More 

Handling Missing Data in Pandas 

Handling missing data is a critical step in any data analysis workflow. In this Python Pandas tutorial, you will learn how to detect, remove, and fill missing values in your datasets. Properly handling missing data ensures that your analysis is accurate and avoids errors in calculations or visualizations. 

Detecting Missing Data 

Before you can handle missing values, you need to identify them. Pandas provides simple methods to detect missing data in your Series or DataFrames. 

Key points: 

  • Use isna() or isnull() to find missing values. 
  • notna() or notnull() identifies valid data. 
  • Summarize missing data using .sum() to see the total count per column. 

Example: 

import pandas as pd 
 
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 
       'Age': [25, None, 35, None], 
       'City': ['Delhi', 'Mumbai', None, 'Bangalore']} 
 
df = pd.DataFrame(data) 
print(df.isna()) 
print(df.isna().sum()) 
 

Column 

Missing Values 

Name 
Age 
City 

Removing Missing Data 

Sometimes, dropping rows or columns with missing data is the easiest solution. 

Common methods: 

  • Drop rows with any missing values: df.dropna() 
  • Drop columns with missing values: df.dropna(axis=1) 
  • Drop only if all values are missing: df.dropna(how='all') 

Filling Missing Data 

In other cases, filling missing data is better than removing it. Pandas provides flexible options: 

  • Fill with a constant value: df.fillna(0) 
  • Fill with the mean, median, or mode of the column: 
df['Age'].fillna(df['Age'].mean(), inplace=True) 
 
  • Forward-fill or backward-fill: df.fillna(method='ffill') or df.fillna(method='bfill') 

Method 

Example 

Description 

Drop rows  df.dropna()  Removes rows with missing values 
Drop columns  df.dropna(axis=1)  Removes columns with missing values 
Fill constant  df.fillna(0)  Replaces missing values with 0 
Fill mean/median  df['Age'].fillna(df['Age'].mean())  Replaces with average value 
Forward/Backward  df.fillna(method='ffill')  Propagates previous/next value 

Handling missing data effectively helps you maintain dataset integrity and ensures that further operations, like filtering or aggregation, produce correct results. Learning these techniques in this Pandas in Python tutorial sets the stage for more advanced data analysis and manipulation tasks. 

Also Read: Big Data Tutorial for Beginners: All You Need to Know 

Data Manipulation in Pandas 

Data manipulation is one of the most important skills when working with datasets in Python. In this Python Pandas tutorial, you will learn how to organize, transform, and combine data to make it more useful for analysis. Pandas provides simple methods to sort, merge, group, and pivot your data efficiently. 

Sorting and Ranking Data 

Sorting data helps you organize your dataset based on column values or index. Ranking allows you to assign ranks to data points. 

Key operations include: 

  • Sort by column: df.sort_values(by='Age') 
  • Sort by index: df.sort_index() 
  • Rank values: df['Score'].rank() 

Example table after sorting by Age: 

Name 

Age 

City 

Alice  25  Delhi 
Bob  30  Mumbai 
Charlie  35  Bangalore 

 Also Read: How to Use Sort in Python: Methods, Parameters, and Examples 

Merging, Joining, and Concatenating DataFrames 

Combining datasets is common in real-world analysis. Pandas makes it easy with merge, join, and concat functions. 

Common methods: 

  • Merge DataFrames: pd.merge(df1, df2, on='ID') 
  • Join DataFrames: df1.join(df2) 
  • Concatenate DataFrames: pd.concat([df1, df2]) 

Method 

Use Case 

Example 

Merge  Combine tables on common column  pd.merge(df1, df2, on='ID') 
Join  Add columns from another DataFrame  df1.join(df2) 
Concatenate  Stack DataFrames vertically/horiz.  pd.concat([df1, df2]) 

Grouping and Aggregation 

Grouping allows you to summarize data based on categories. Aggregation functions like sum, mean, or count help extract insights quickly. 

Example usage: 

grouped = df.groupby('City')['Age'].mean() 
print(grouped) 
 

City 

Average Age 

Delhi  25 
Mumbai  30 
Bangalore  35 

By learning these data manipulation techniques in this Python Pandas tutorial, you can clean, organize, and prepare your datasets efficiently. Mastering sorting, merging, and grouping allows you to focus on analysis and insights rather than repetitive manual work. 

Also Read: 30 Data Science Project Ideas for Beginners in 2025 

Conclusion 

And now, we have reached the end of this Python Pandas tutorial. We hope you found it useful and informative. Python Pandas is a vast topic, and with the numerous functions it has, it would take some time for one to get familiar with it completely.  

If you’re interested in learning more about Python, its various libraries, including Pandas, and its application in data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.

Frequently Asked Questions (FAQs)

1. What is Pandas in Python?

Pandas in Python is a library used for data manipulation and analysis. It provides data structures like Series and DataFrames to efficiently work with structured datasets, making data cleaning and processing easier for beginners. 

2. Why should I learn Pandas in Python?

Learning Pandas in Python helps you organize, filter, and analyze data quickly. It simplifies common tasks such as merging datasets, handling missing values, and performing basic statistics, making it easier to start practical data analysis projects. 

3. How do I install Pandas in Python?

You can install Pandas using pip install pandas in your terminal or command prompt. After installation, import it with import pandas as pd to begin loading and analyzing datasets efficiently in Python. 

4. What are the main data structures in Pandas?

Pandas primarily uses Series (one-dimensional) and DataFrames (two-dimensional) for data storage. Series is like a single column, while DataFrame is a table of rows and columns. Both are essential for handling structured datasets in Python. 

5. How do I read a CSV file in Pandas?

Use pd.read_csv('filename.csv') to load data into a DataFrame. Pandas also supports Excel, JSON, and SQL files, making it easy to start exploring and analyzing datasets without complex code. 

6. How can I view the first few rows of a DataFrame?

Use df.head() to display the first five rows or df.tail() for the last five. This allows you to quickly inspect your dataset and understand its structure before performing further analysis. 

7. How do I handle missing data in Pandas?

Missing values can be removed with df.dropna() or filled with df.fillna(). You can use mean, median, or forward/backward fill techniques to handle gaps, ensuring your dataset remains complete for analysis. 

8. How do I filter data in Pandas?

Filtering is done using conditions like df[df['Age'] > 25]. You can combine multiple conditions to extract specific subsets of your dataset efficiently for further analysis. 

9. How do I select specific columns?

Select a single column using df['ColumnName'] or multiple columns with df[['Col1', 'Col2']]. This helps focus on the relevant data needed for analysis or visualization. 

10. How can I add a new column to a DataFrame?

Add a column using df['NewColumn'] = values. Pandas allows dynamic updates to your dataset, which is useful for calculated fields or storing results from operations applied to other columns. 

11. What is the difference between Series and DataFrame?

Series is one-dimensional, suitable for single columns, while DataFrame is two-dimensional with multiple columns. Knowing this difference is important to structure your data correctly for analysis and operations. 

12. How do I sort data in Pandas?

Sort a DataFrame by column values with df.sort_values(by='ColumnName') or by index with df.sort_index(). Sorting helps organize your data for clearer analysis or reporting. 

13. How do I merge two DataFrames?

Use pd.merge(df1, df2, on='KeyColumn') to combine datasets on a common column. Merging is useful for integrating different sources of information for analysis. 

14. How do I group data in Pandas?

Grouping is done with df.groupby('ColumnName') to aggregate values, like computing sums, averages, or counts. This helps summarize data for easier interpretation. 

15. How can I compute basic statistics in Pandas?

Pandas provides functions like df.mean(), df.sum(), and df.describe() to calculate descriptive statistics, allowing you to quickly understand your dataset. 

16. Can I visualize data using Pandas?

Yes, you can plot data directly using df.plot() for line charts, bar charts, and histograms. For more advanced plots, Pandas works well with libraries like Matplotlib and Seaborn. 

17. How do I remove a column or row in Pandas?

Use df.drop('ColumnName', axis=1) to remove a column or df.drop(index) to remove specific rows. This is helpful for cleaning unnecessary data from your dataset. 

18. How can I apply a function to a column?

Use df['Col'].apply(function) to apply a custom or built-in function to each value in a column. This is useful for transformations like formatting, calculations, or text processing. 

19. Is Pandas suitable for large datasets?

Pandas can handle reasonably large datasets efficiently. Techniques like chunking or using optimized data types help manage memory, making Pandas suitable for practical data analysis tasks. 

20. How do I save a DataFrame to a CSV file?

Use df.to_csv('output.csv', index=False) to export your DataFrame. This preserves your cleaned and processed dataset for sharing or further use in analysis. 

Rohit Sharma

834 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in DS & AI

360° Career Support

Executive PG Program

12 Months