Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconData Sciencebreadcumb forward arrow iconPython Pandas Tutorial: Everything Beginners Need to Know about Python Pandas

Python Pandas Tutorial: Everything Beginners Need to Know about Python Pandas

Last updated:
26th Mar, 2020
Views
Read Time
9 Mins
share image icon
In this article
Chevron in toc
View All
Python Pandas Tutorial: Everything Beginners Need to Know about Python Pandas

In this article, we’ll be taking a look at one of the popular libraries of Python essential for data professionals, Pandas. You’d get to learn about its basics as well as its operations.

Let’s get started. 

What is Pandas?

Python Pandas is popular for many reasons. Its primary application is data manipulation, its analysis as well as cleaning. You can use it for various data types and datasets, including unlabelled data, and ordered time-series data. To put it simply, we can say that Pandas is your data’s home. You can perform numerous operations on your data with this tool. 

You can convert the data format of a file, merge two data sets, make calculations, visualize it by taking help from Matplotlib, etc. With so many functionalities, it’s a popular choice among data professionals. That’s why learning about it is essential. And without understanding its working, you can’t use it, so in this Python Pandas tutorial, we’ll be focusing on the same. 

Read: Python Data Visualization Libraries

Role of Pandas in Data Science

The Pandas library is an integral part of any data professional’s arsenal. It’s based on NumPy, which is another popular Python library. A lot of NumPy’s structure is present in Pandas, so if you’re familiar with the former, you wouldn’t have any difficulty in getting familiar with the latter. 

Most of the time, experts use Pandas to feed data in SciPy for statistical analysis. They also use this data with Matplotlib or Scikit-learn for their functions (plotting functions and machine learning, respectively). 

Learn more about Python’s machine learning libraries.

Prerequisites

Before we begin discussing the working of Python Pandas and its operations, we should first make it clear as to who can use it properly and who can’t. You should first be familiar with Python’s underlying code and NumPy. 

The first one, i.e., Python’s fundamentals, is vital for obvious reasons. You wouldn’t understand much without knowing how Python code works. And even if you do, you wouldn’t be able to try out the code as you’d still need to learn the underlying code first. 

The second one, NumPy, is essential to learn because Pandas is based on it. Having an understanding of NumPy will help you considerably in getting familiar with Pandas. 

You can learn about Python through our blogs on data science and Python. We have many helpful guides and articles that can make you familiar with the basics. It’s free, and if you have any doubts, you can write them down in the comment section. 

If you’re familiar with both of the topics we mentioned, let’s take a look at Pandas deeply:

Learn data science course from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.

Installing Pandas

To use Pandas, you’ll have to install it. The best thing is, installation and import of Pandas is very easy. Just open up the command line (if you use a Mac, you’ll have to open the terminal) and install Pandas by using these codes:

 

For PC users: pip install pandas

For Mac users: conda install pandas

 

In Pandas, you’ll be dealing with series and dataframes. While a series refers to a column, a data frame refers to a multi-dimensional table that has multiple series. Let’s now take a look at the operations you can perform in Pandas.

Operations in Pandas

Now that we’ve discussed its importance and definition, we should now consider the actions you can perform in this Python Pandas tutorial. Pandas provides you with a lot of functions, and we’ve discussed them below:

Data viewing

You’ll want to print out some of the rows of your data set in the beginning to keep them as a visual reference. And you can do so with the .head() function. 

file1.head()

This function gives you the first five rows of the data frame. If you want to get more rows than the first five, you can just pass the required number in the function. Suppose you want the first 15 rows of the data frame, you’ll write the following code:

file1.head(15)

You also have the option of viewing the last five rows of the data frame. You can do so by using the .tail() function. And just like the .head() function, the .tail() function can also accept a number and give you the required quantity of rows.

file1.tail(20)

This code would give you the last 20 rows of your data frame. 

Getting Information

One of the first functions data scientists use with Pandas is .info(). That’s because it displays information about the data frame and gives you a deeper understanding of what you’re working with. Here’s how you use it in Pandas:

file1.info()

It provides you with a lot of useful information about the dataset, such as the quantity of the non-null values, the number of rows, the type of data present in a column, etc. 

Knowing the datatype of your data frame’s values is essential in many cases. Suppose you need to perform arithmetic operations on the data but it has strings. When you’d run your mathematical operations, you’d see an error pop up because you can’t perform such operations on strings. If one the other hand, you’d use the .info() function before doing any operations, you’d know already that you have strings. 

Explore our Popular Data Science Courses

While the .info() function shows you the general information about your dataset, the .shape attribute gives you a tuple of your data frame. You can find out how many rows and columns your dataset has with the help of the .shape attribute. And you can use it in the following way:

file1.shape

This attribute doesn’t have parentheses because it only gives you a tuple of rows and columns. You’ll be using the .shape attribute quite often while cleaning your data. 

Also learn: Python Developer Salary in India

upGrad’s Exclusive Data Science Webinar for you –

Watch our Webinar on The Future of Consumer Data in an Open Data Economy

 

Concatenation

Let’s now discuss the concatenation attribute in this Python Pandas tutorial. Concatenation refers to joining two or more things together. So, with this attribute, you can combine two datasets without modifying their values or data points in any way. They combine together as is. You’ll have to use the .concat() function for this purpose. Here’s how:

 result = pd.concat([file1,file2])

It’ll combine the file1 and file2 dataframes and show them as a single data frame. 

df1 = pd.DataFrame({“HPI”:[80,90,70,60],”Int_Rate”:[2,1,2,3], “IND_GDP”:[50,45,45,67]}, index=[2001, 2002,2003,2004])

df2 = pd.DataFrame({“HPI”:[80,90,70,60],”Int_Rate”:[2,1,2,3],”IND_GDP”:[50,45,45,67]}, index=[2005, 2006,2007,2008])

concat= pd.concat([df1,df2])

print(concat)

Top Data Science Skills to Learn to upskill

The output of the above code: 

HPI IND_GDP Int_Rate

2001 80 50 2

2002 90 45 1

2003 70 45 2

2004 60 67 3

2005 80 50 2

2006 90 45 1

2007 70 45 2

2008 60 67 3

You must’ve noticed how the .concat() function has combined the two dataframes and converted them into one. 

Changing the Index

You can change the index values in your data frame as well. For that purpose, you’ll need to use the .set_index() function. In the parentheses of this function, you’d have to enter the details to change the index. Take a look at the following example to understand it better. 

import pandas as pd

df= pd.DataFrame({“Day”:[1,2,3,4], “Visitors”:[200, 100,230,300], “Bounce_Rate”:[20,45,60,10]})

df.set_index(“Day”, inplace= True)

print(df)

The output of the above code:

Bounce_Rate Visitors

Day

1 20   200

2 45   100

3 60   230

4 10   300

You can see that our code changed the index value of the data according to the days. 

Changing the Column Headers

You can change the column headers in Python Pandas as well. All you have to do is to use the .rename() function. You can enter the column names that were present initially in the parentheses and the column names you want to appear in the output code. 

Suppose you have a table with its column header as ‘Time,’ and you want to change it into ‘Hours.’ You can change the name of this column with the following code:

df = df.rename(columns={“Time” : “Hours”})

This code will change the name of the column header from ‘Time’ to ‘Hours.’ This is an excellent function for efficient practices. Let’s take a look at how you can convert the formats of your data. 

Data Munging

With data munging, you have the option of converting the format of specific data. You can convert a .csv file into an .html file or do vice versa. Here’s an example of how you can do so:

import pandas as pd

country= pd.read_csv(“D:UsersUser1Downloadsworld-bank-youth-unemploymentAPI_ILO_country_YU.csv”,index_col=0)

country.to_html(‘file1.html’)

After you’ve run this code, it’ll create an HTML file for you, which you can run on your browser. Data munging is an excellent function, and you’ll find its use in many situations. 

Read our popular Data Science Articles

Conclusion

And now, we have reached the end of this Python Pandas tutorial. We hope you found it useful and informative. Python Pandas is a vast topic, and with the numerous functions it has, it would take some time for one to get familiar with it completely. 

If you’re interested in learning more about Python, its various libraries, including Pandas, and its application in data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.

Profile

Rohit Sharma

Blog Author
Rohit Sharma is the Program Director for the UpGrad-IIIT Bangalore, PG Diploma Data Analytics Program.

Frequently Asked Questions (FAQs)

1Do I need to know Python for using Pandas?

Before you get started with Pandas, you need to understand that it is a package built for Python. So, you definitely need to have a firm grip on the basics as well as the syntax of Python programming to start using Pandas with ease. Whenever it comes down to working with tabular data in Python, Pandas is considered the best choice.

But, you need to get clear with the syntax being used in Python before starting with Pandas. It is unnecessary to spend a huge amount of time on it, but you only need to put in enough time to get clear with the basic syntax so that you can start with tasks involving Pandas.

2How long does it take to learn Pandas in Python?

Pandas is the most widely used Python library for dealing with tabular data. You can use Pandas for all the tasks that you might use Excel for. If you are already aware of Python programming and its syntax, then you can easily get familiar with the functioning of Pandas within two weeks. When you are beginning with Pandas, you should start with the basic data manipulation projects in order to get a grip.

As you progress further, you’ll notice that Pandas is a very useful data science tool that can be a key factor driving business decisions in several industries.

3Should I prefer learning Numpy or Pandas first?

It is preferred to learn Numpy before Pandas because Numpy is the most fundamental module in Python for scientific computing. You will also receive the support of highly optimized multidimensional arrays that are considered to be the most basic data structure of every Machine Learning algorithm.

Once you are done with learning Numpy, then you should begin with Pandas because Pandas is considered to be an extension of Numpy. This is because the underlying code of Pandas uses the Numpy library extensively.

Explore Free Courses

Suggested Blogs

Top 12 Reasons Why Python is So Popular With Developers in 2024
99361
In this article, Let me explain you the Top 12 Reasons Why Python is So Popular With Developers. Easy to Learn and Use Mature and Supportive Python C
Read More

by upGrad

31 Jul 2024

Priority Queue in Data Structure: Characteristics, Types & Implementation
57691
Introduction The priority queue in the data structure is an extension of the “normal” queue. It is an abstract data type that contains a
Read More

by Rohit Sharma

15 Jul 2024

An Overview of Association Rule Mining & its Applications
142465
Association Rule Mining in data mining, as the name suggests, involves discovering relationships between seemingly independent relational databases or
Read More

by Abhinav Rai

13 Jul 2024

Data Mining Techniques & Tools: Types of Data, Methods, Applications [With Examples]
101802
Why data mining techniques are important like never before? Businesses these days are collecting data at a very striking rate. The sources of this eno
Read More

by Rohit Sharma

12 Jul 2024

17 Must Read Pandas Interview Questions & Answers [For Freshers & Experienced]
58170
Pandas is a BSD-licensed and open-source Python library offering high-performance, easy-to-use data structures, and data analysis tools. The full form
Read More

by Rohit Sharma

11 Jul 2024

Top 7 Data Types of Python | Python Data Types
99516
Data types are an essential concept in the python programming language. In Python, every value has its own python data type. The classification of dat
Read More

by Rohit Sharma

11 Jul 2024

What is Decision Tree in Data Mining? Types, Real World Examples & Applications
16859
Introduction to Data Mining In its raw form, data requires efficient processing to transform into valuable information. Predicting outcomes hinges on
Read More

by Rohit Sharma

04 Jul 2024

6 Phases of Data Analytics Lifecycle Every Data Analyst Should Know About
82932
What is a Data Analytics Lifecycle? Data is crucial in today’s digital world. As it gets created, consumed, tested, processed, and reused, data goes
Read More

by Rohit Sharma

04 Jul 2024

Most Common Binary Tree Interview Questions & Answers [For Freshers & Experienced]
10561
Introduction Data structures are one of the most fundamental concepts in object-oriented programming. To explain it simply, a data structure is a par
Read More

by Rohit Sharma

03 Jul 2024

Want to build a career in Data Science?Download Career Growth Report
icon
footer sticky close icon