Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconData Sciencebreadcumb forward arrow iconMastering Pandas: Important Pandas Functions For Your Next Project

Mastering Pandas: Important Pandas Functions For Your Next Project

Last updated:
30th Nov, 2020
Views
Read Time
6 Mins
share image icon
In this article
Chevron in toc
View All
Mastering Pandas: Important Pandas Functions For Your Next Project

Pandas library has been an all-time favorite for all Data Scientists or analysts because of its easy-to-use nature, a wide range of functionalities, and better interpretation of the results. Any individual starting their Data Science journey is advised to have a good command over pandas, come up with pipelines to reduce the manual effort of cleaning and preprocessing the data.

Pandas is built over Numpy which allows faster execution of commands and getting the work done in less time. In this article, we will share some underrated pandas functions that can enrich your project’s code quality.

Before moving ahead, here is a quick legend:

  • All the commands mentioned assume that the data frame is named as ‘df’ which is an object of pd.DataFrame()
  • The Pandas library has been imported as an alias as ‘pd’.

Check out our data science online courses to upskill yourself

String Accessors

String or text data contributes a major part to a dataset. Whether it is information related to the author, title, publication of a book, or tweets made for a particular hashtag, we have a lot of text data and this data comes in handy when cleaned properly and feed to any classifier like Naive Bayes, etc. Here are some tricks you can apply:

  • To access the string type data, use the ‘str’ accessor. For example, df[‘column_name’].str
  • This makes it possible to do all the string operations on the column selected.
  • Some common operations include, 
    • df[‘column_name’].str.len(): length of each string
    • .str.split(): Splitting at particular character
    • .str.contains(): Returns T/F about whether the particular word is present in the string
    • .str.count(): Returns the count of rows satisfying the regular expression passed. 
    • .str.findall(): Returns the results which match the expression passed.
    • .str.replace(): Same as findall but here replacement of matched items occur
    • All string operations such as .title, .isalpha, .isalnum, .isdecimal etc are supported.

Also Read: Pandas Dataframe Astype

Datetime Accessors

Dates and time are commonly present in datasets in the form of timestamps, start time, end time, or any other timing associated with that event. It is useful to parse this data properly as it gives trends along a timeline that can be put out to predict future events or we call quote it as time-series analysis. Let’s see some useful commands:

  • To access the DateTime data, convert the current data type (date values are parsed as string or object) to DateTime using the pd.to_datetime() function.
  • Now, using the ‘.dt’ accessor, we can access any DateTime information required such as :
    • df[‘column_name’].dt.day: Returns the day of the date.
    • .dt.time: Time
    • .dt.year: Year of the date
    • .dt.month: Month of the date
    • .dt.weekday: Whether it is Sunday, Monday… in the numerical form where 0 represents Monday. If you want day names, then use .dt.day_name
    • .dt.is_month_start: Returns T/F depending on whether the date is the first of the month.
    • .dt.is_month_end Same functionality as month_start but here the last date of the month is verified.
    • .dt.quater: Returns in which quarter the date lies
    • .dt.is_quater_start:  Returns T/F whether the date is the first day of the quarter
    • .dt.is_quater_end: whether it is the last day of the quarter
    • .dt.normalize: When the time component does not add a valuable contribution to the analysis, it can be ignored. This command rounds off the time to midnight i.e., 00:00:00. 

Pandas Plotting

Plotting visualizations is one of the key components of Data Analysis and plays a major role while performing feature engineering. For example, outliers in a dataset can be detected using box plots which represents the median and interquartile range, leaving outliers at the extreme ends.

Plotting is done mostly via other libraries such as seaborn, plotly, bokeh, matplotlib, but when you want to instantly visualize data without explicitly defining the libraries? Pandas got the solution. Using the pd.plot() function, you can directly plot graphs that are invoked internally using matplotlib. Various options available for this:

  • df.plot() or df[‘column_name’].plot() (depending upon type of graph) 
  • df.plot() has parameter ‘kind’ which defines the graph. By default, it is a ‘line’ plot but other options available are ‘bar’, ‘barh’, ‘box’, ‘hist’, ‘kde’ etc.
  • It invokes matplotlib backend that means we can access its arguments via an ‘ax’ accessor. 
  • .plot() function can also take arguments such as ‘title’, ‘xticks’, ‘xlim’, ‘xlabel’, ‘fontsize’, ‘colormap’ which eradicates the need of defining external libraries up to some extent. 

upGrad’s Exclusive Data Science Webinar for you –

Watch our Webinar on How to Build Digital & Data Mindset?

Explore our Popular Data Science Online Courses

Miscellaneous Functions

  • pd.get_dummies(): While preprocessing data, sometimes we are encountered with categorical data that needs to be converted into numerical form to be fed to the model. When these categories are fairly low, one-hot encoding is preferred, but doing this manually takes along. This dummies function not only transforms the values but, if drop_first set to True, drops the previous column containing all the categories.
  • df.query(): It is the function that allows you to apply the conditional mask over the data frame. The basic difference between this and normal masking is that this function directly returns the values instead of the boolean mask, reducing the effort of creating the mask and applying it to the data frame.
  • df.select_dtypes(): Sometimes we need to perform some specific tasks on one type of data type. For example, while reading data from external files, some data types are defined as objects. While cleaning the data, the dataset must have all the correct data types, and doing it manually by df.astype(‘data-type’) would be tedious when the number of such data types is large. This function selects the specified data type and it can be combined with the .apply() function. A sample code would look like this:

df.select_dtypes(object).apply(astype(str))

Top Data Science Skills to Learn to upskill

Must Read: Pandas Interview Questions

Read our popular Data Science Articles

Conclusion

This assignment is referred to as chaining, and it is very common while doing data science tasks to reduce the effort of defining variables for every step to be performed.

If you are curious to learn about Pandas, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.

, to_datetime(), value_counts(). These functions are extremely important for Data Scientists and Data Analysts. The functions help to view data, edit values, return outcomes, cast, access datasets, change formats, find unique and duplicate values, merge data, and sort data. ” image-2=”” count=”3″ html=”true” css_class=””]
Profile

Rohit Sharma

Blog Author
Rohit Sharma is the Program Director for the UpGrad-IIIT Bangalore, PG Diploma Data Analytics Program.

Explore Free Courses

Suggested Blogs

What is Linear Data Structure? List of Data Structures Explained
49864
Data structures are the data structured in a way for efficient use by the users. As the computer program relies hugely on the data and also requires a
Read More

by Rohit Sharma

30 Nov 2023

KDD Process in Data Mining: What You Need To Know?
46820
As a working professional, you are familiar with terms like data, database, information, processing, etc. You must have also come across terms like da
Read More

by Rohit Sharma

30 Nov 2023

Top 7 Data Types of Python | Python Data Types
96888
Data types are an essential concept in the python programming language. In Python, every value has its own python data type. The classification of dat
Read More

by Rohit Sharma

30 Nov 2023

Searching in Data Structure: Different Search Methods Explained
33246
The communication network is expanding, and so the people are using the internet! Businesses are going digital for efficient management. The data gene
Read More

by Rohit Sharma

30 Nov 2023

How to Implement Switch Case Functions in Python? [2023]
113093
Introduction Have you ever wondered if there is an alternative to write those complex If-else statements in Python? If you do not want multiple ‘If’
Read More

by Rohit Sharma

30 Nov 2023

Binary Tree in Data Structure: Properties, Types, Representation & Benefits
77348
Data structures serve as the backbone of efficient data organization and management within computer systems. They play a pivotal role in computer algo
Read More

by Rohit Sharma

30 Nov 2023

Top 10 Business Intelligence Interview Questions and Answers [For Beginners & Experienced]
16090
Introduction When you buy a product from an e-commerce site, you will often find that you will be asked to review the product and provide feedback on
Read More

by Rohit Sharma

29 Nov 2023

10 Exciting Python GUI Projects & Topics For Beginners [2023]
54596
Python GUI projects offer a great way to become proficient in Python programming, and they include a wide range of exciting options, including Python
Read More

by Rohit Sharma

27 Nov 2023

17 Must Read Pandas Interview Questions & Answers [For Freshers & Experienced]
50416
Pandas is a BSD-licensed and open-source Python library offering high-performance, easy-to-use data structures, and data analysis tools. Python with P
Read More

by Rohit Sharma

04 Oct 2023

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon