Home
Blog
Data Science
Mastering Pandas: Important Pandas Functions For Your Next Project

Mastering Pandas: Important Pandas Functions For Your Next Project

Updated on Nov 25, 2022 | 6 min read | 5.71K+ views

Table of Contents

View all

String Accessors
Datetime Accessors
Pandas Plotting
Miscellaneous Functions
Conclusion

Pandas library has been an all-time favorite for all Data Scientists or analysts because of its easy-to-use nature, a wide range of functionalities, and better interpretation of the results. Any individual starting their Data Science journey is advised to have a good command over pandas, come up with pipelines to reduce the manual effort of cleaning and preprocessing the data.

Pandas is built over Numpy which allows faster execution of commands and getting the work done in less time. In this article, we will share some underrated pandas functions that can enrich your project’s code quality.

Before moving ahead, here is a quick legend:

All the commands mentioned assume that the data frame is named as ‘df’ which is an object of pd.DataFrame()
The Pandas library has been imported as an alias as ‘pd’.

Check out our data science online courses to upskill yourself

String Accessors

String or text data contributes a major part to a dataset. Whether it is information related to the author, title, publication of a book, or tweets made for a particular hashtag, we have a lot of text data and this data comes in handy when cleaned properly and feed to any classifier like Naive Bayes, etc. Here are some tricks you can apply:

To access the string type data, use the ‘str’ accessor. For example, df[‘column_name’].str
This makes it possible to do all the string operations on the column selected.
Some common operations include,
- df[‘column_name’].str.len(): length of each string
- .str.split(): Splitting at particular character
- .str.contains(): Returns T/F about whether the particular word is present in the string
- .str.count(): Returns the count of rows satisfying the regular expression passed.
- .str.findall(): Returns the results which match the expression passed.
- .str.replace(): Same as findall but here replacement of matched items occur
- All string operations such as .title, .isalpha, .isalnum, .isdecimal etc are supported.

Also Read: Pandas Dataframe Astype

Datetime Accessors

Dates and time are commonly present in datasets in the form of timestamps, start time, end time, or any other timing associated with that event. It is useful to parse this data properly as it gives trends along a timeline that can be put out to predict future events or we call quote it as time-series analysis. Let’s see some useful commands:

To access the DateTime data, convert the current data type (date values are parsed as string or object) to DateTime using the pd.to_datetime() function.
Now, using the ‘.dt’ accessor, we can access any DateTime information required such as :
- df[‘column_name’].dt.day: Returns the day of the date.
- .dt.time: Time
- .dt.year: Year of the date
- .dt.month: Month of the date
- .dt.weekday: Whether it is Sunday, Monday… in the numerical form where 0 represents Monday. If you want day names, then use .dt.day_name
- .dt.is_month_start: Returns T/F depending on whether the date is the first of the month.
- .dt.is_month_end Same functionality as month_start but here the last date of the month is verified.
- .dt.quater: Returns in which quarter the date lies
- .dt.is_quater_start: Returns T/F whether the date is the first day of the quarter
- .dt.is_quater_end: whether it is the last day of the quarter
- .dt.normalize: When the time component does not add a valuable contribution to the analysis, it can be ignored. This command rounds off the time to midnight i.e., 00:00:00.

Pandas Plotting

Plotting visualizations is one of the key components of Data Analysis and plays a major role while performing feature engineering. For example, outliers in a dataset can be detected using box plots which represents the median and interquartile range, leaving outliers at the extreme ends.

Plotting is done mostly via other libraries such as seaborn, plotly, bokeh, matplotlib, but when you want to instantly visualize data without explicitly defining the libraries? Pandas got the solution. Using the pd.plot() function, you can directly plot graphs that are invoked internally using matplotlib. Various options available for this:

df.plot() or df[‘column_name’].plot() (depending upon type of graph)
df.plot() has parameter ‘kind’ which defines the graph. By default, it is a ‘line’ plot but other options available are ‘bar’, ‘barh’, ‘box’, ‘hist’, ‘kde’ etc.
It invokes matplotlib backend that means we can access its arguments via an ‘ax’ accessor.
.plot() function can also take arguments such as ‘title’, ‘xticks’, ‘xlim’, ‘xlabel’, ‘fontsize’, ‘colormap’ which eradicates the need of defining external libraries up to some extent.

upGrad’s Exclusive Data Science Webinar for you –

Watch our Webinar on How to Build Digital & Data Mindset?

Popular Data Science Programs

Post Graduate Certificate in Data Science DevOps Course Online M Sc in Data Science Degree PGD in Data Science Data Science Machine Learning Course

Explore our Popular Data Science Online Courses

Executive Post Graduate Programme in Data Science from IIITB	Professional Certificate Program in Data Science for Business Decision Making	Master of Science in Data Science from University of Arizona
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Online Courses

Miscellaneous Functions

pd.get_dummies(): While preprocessing data, sometimes we are encountered with categorical data that needs to be converted into numerical form to be fed to the model. When these categories are fairly low, one-hot encoding is preferred, but doing this manually takes along. This dummies function not only transforms the values but, if drop_first set to True, drops the previous column containing all the categories.
df.query(): It is the function that allows you to apply the conditional mask over the data frame. The basic difference between this and normal masking is that this function directly returns the values instead of the boolean mask, reducing the effort of creating the mask and applying it to the data frame.
df.select_dtypes(): Sometimes we need to perform some specific tasks on one type of data type. For example, while reading data from external files, some data types are defined as objects. While cleaning the data, the dataset must have all the correct data types, and doing it manually by df.astype(‘data-type’) would be tedious when the number of such data types is large. This function selects the specified data type and it can be combined with the .apply() function. A sample code would look like this:

df.select_dtypes(object).apply(astype(str))

Top Data Science Skills to Learn to upskill

SL. No	Top Data Science Skills to Learn
1	Data Analysis Online Courses	Inferential Statistics Online Courses
2	Hypothesis Testing Online Courses	Logistic Regression Online Courses
3	Linear Regression Courses	Linear Algebra for Analysis Online Courses

Must Read: Pandas Interview Questions

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

Read our popular Data Science Articles

Data Science Career Path: A Comprehensive Career Guide	Data Science Career Growth: The Future of Work is here	Why is Data Science Important? 8 Ways Data Science Brings Value to the Business
Relevance of Data Science for Managers	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist
Career in Data Science	Data Science Top 10 Careers in 2025	Business Intelligence vs Data Science: What are the differences?

Conclusion

This assignment is referred to as chaining, and it is very common while doing data science tasks to reduce the effort of defining variables for every step to be performed.

If you are curious to learn about Pandas, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.

, to_datetime(), value_counts(). These functions are extremely important for Data Scientists and Data Analysts. The functions help to view data, edit values, return outcomes, cast, access datasets, change formats, find unique and duplicate values, merge data, and sort data. ” image-2=”” count=”3″ html=”true” css_class=””]

Rohit Sharma

763 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources