Data analysis has become a new genre of study, and all thanks to Python. If you are an enthusiast data analyst who works on Python almost absolutely use the Pandas library, then this article is for you. This Pandas cheatsheet will go through all the essential methods that come in handy while analyzing data.
You might have encountered situations where it is hard to remember the specific syntax for doing something in Pandas. These Pandas cheat sheet commands will help you easily remember and reference the most common Pandas operations. If you are a beginner in python and data science, upGrad’s data science courses can definitely help you dive deeper into the world of data and analytics.
Using the Pandas Cheatsheet
Before using this Pandas cheat sheet, you should thoroughly learn Pandas Tutorial and then refer to this cheat sheet for remembering and clearance. Pandas cheat sheet will help you quickly look for methods you have already learned, and it can come in handy even if you are going for an exam or interview. We have collected and grouped all the commands used frequently in the Pandas by a data analyst for easy detection. In this Pandas cheat sheet, we will use the following shorthand for representing different objects.
- df: For representing any Pandas DataFrame object
- ser: For representing any Pandas Series object
You have to use these following relevant libraries for implementing the methods mentioned below in this article.
- import pandas as pd
- import numpy as np
Must Read: Pandas Interview Questions
1. Import data from different files
- To read all data from a CSV file: pd.read_csv(file_name)
- To read all data from a delimited text file (like TSV): pd.read_table(file_name)
- To read from an Excel sheet: pd.read_excel(file_name)
- To read data from a SQL database: pd.read_sql(query, connectionObject)
- Fetching the data from a JSON formatted string or URL: pd.read_json(jsonString)
- To take the contents of your clipboard: pd.read_clipboard()
2. Export DataFrames in different file formats
- To write a DataFrame to a CSV file: df.to_csv(file_name)
- To write a DataFrame to an Excel file: df.to_excel(file_name)
- To write a DataFrame to a SQL table: df.to_sql(tableName, connectionObject)
- To write a DataFrame to a file in JSON format: df.to_json(file_name)
3. Inspect a particular section of your DataFrame or Series
- To fetch all the information related to index, datatype, and memory: df.info()
- To extract the starting ‘n’ rows of your DataFrame: df.head(n)
- To extract the ending ‘n’ rows of your DataFrame: df.tail(n)
- To extract the number of rows and columns available in your DataFrame: df.shape
- To summarize the statistics for numerical columns: df.describe()
- To view unique values along with their counts: ser.value_counts(dropna=False)
4. Selecting a specific subset of your data
- Extract the first row: df.iloc[0,:]
- To extract the first element of your DataFrame’s first column: df.iloc[0,0]
- To return columns having label ‘col’ as Series: df[col]
- To return columns having a new DataFrame: df[[col1,col2]]
- To select data by position: ser.iloc[0]
- To select data by index: ser.loc[‘index_one’]
upGrad’s Exclusive Data Science Webinar for you –
ODE Thought Leadership Presentation
5. Data Cleaning Commands
- To rename columns in masses: df.rename(columns = lambda x: x + 1)
- To rename columns selectively: df.rename(columns = {‘oldName’: ‘newName’})
- To rename the index in masses: df.rename(index = lambda x: x + 1)
- To rename columns in sequence: df.columns = [‘x’, ‘y’, ‘z’]
- To check if null values exists, returns a boolean arrray accordingly: pd.isnull()
- The reverse of pd.isnull(): pd.notnull()
- Drops all rows containing null values: df.dropna()
- Drops all columns containing null values: df.dropna(axis=1)
- To replace each null value with ‘n’: df.fillna(n)
- To convert all the datatypes of the series into float: ser.astype(float)
- To replace all numbered 1 with ‘one’ and 3 with ‘three’: ser.replace([1,2], [‘one’,’two’])
Also Read: Pandas Dataframe Astype
Explore our Popular Data Science Courses
6. Groupby, Sort, and Filter Data
- To return a groupby object for column values: df.groupby(colm)
- To return groupby object for multiple column values: df.groupby([colm1, colm2])
- To sort values in ascending order (by column): df.sort_values(colm1)
- To sort values in descending order (by column): df.sort_values(colm2, ascending=False)
- Extract rows where the column value is greater than 0.6: df[df[colm] > 0.6]
Read our popular Data Science Articles
7. Others
- Add the rows of the first DataFrame to the end of the second DataFrame: df1.append(df2)
- Add the columns of the first DataFrame to the end of the second DataFrame: pd.concat([df1,df2],axis=1)
- To return the mean of all columns: df.mean()
- To return the number of non-null values: df.count()
Top Data Science Skills to Learn
SL. No | Top Data Science Skills to Learn | |
1 | Data Analysis Programs | Inferential Statistics Programs |
2 | Hypothesis Testing Programs | Logistic Regression Programs |
3 | Linear Regression Programs | Linear Algebra for Analysis Programs |
Conclusion
These Pandas cheat sheets will be useful only for rapid recall. It is always a good approach to practice the commands before directly jumping into the Pandas cheat sheet.
If you are curious to learn about Pandas, check out IIIT-B & upGrad’s Executive PG Programme in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.