60 Most Asked Pandas Interview Questions and Answers [ANSWERED + CODE]

By Rohit Sharma

Updated on Jul 15, 2025 | 22 min read | 61.27K+ views

Share:

Did you know? According to the State of Data & AI Literacy Report 2025, 86% of business leaders say data literacy is essential for their teams’ daily work. With Pandas being a core tool for handling and analyzing data, building expertise in it is no longer optional, it’s essential.

If you’re preparing for Pandas interview questions, expect to cover data manipulation, cleaning, merging, and analysis with Python’s pandas library. Interviewers often ask how you manage DataFrames, handle missing values, and perform group operations. They also look for clear, efficient code. Strong pandas skills show you can handle real data challenges with confidence.

This blog will not only cover 60 of the most frequently asked Pandas interview questions and answers (with code examples) but will also provide you with practical tips on how to tackle them with confidence. 

Build your Python and Pandas skills with upGrad’s online software development course. Learn to work with data, write better code, and choose the right tools for your projects!

Top Pandas Interview Questions for Beginners and Professionals

Top Pandas interview questions for beginners and professionals often focus on practical tasks, such as filtering data, combining datasets, managing missing values, and applying group operations. You can also expect questions that test how well you write clear, fast code with Pandas, or how you might handle larger datasets efficiently using tools like bloom filters.

Before these technical rounds, most companies start by reviewing your resume to see if your projects and experience fit their data requirements. Once you move past that stage, you’ll face a mix of theory-based and hands-on coding questions. 

Pandas skills can set you apart in data roles, from crunching numbers to building smart models. Want to get better at it? Check out these top programs that’ll give you the right mix of Python, Pandas, and real-world projects:

Now that you’re familiar with the steps of the interview process, let’s discuss some of the top Pandas interview questions for both beginners and professionals. 

Pandas Interview Questions for Freshers

This section is your starting point, packed with fundamental python pandas interview questions designed for freshers and entry-level professionals. These questions lay the groundwork, helping you understand core concepts that are essential to building confidence in tackling real-world problems. Interviewers often ask these to test your familiarity with Python Pandas basics and ensure you can handle simple data tasks.

Get ready to dive into freshers pandas interview questions, each designed to strengthen your grasp on this indispensable library.

1. What are Pandas in Python, and why are they used?

How to Answer:
Explain that pandas is an open-source Python library designed for working with structured data. Mention that it’s widely used for data cleaning, manipulation, and analysis. Highlight how it helps handle large datasets and makes tasks like reading files, filtering, and aggregating data easier.

Sample Answer:
Pandas is an open-source library in Python used for data manipulation and analysis. It’s mainly used to work with structured data, offering powerful tools to clean, transform, and analyze large datasets easily.

Example:

import pandas as pd

# Creating a simple DataFrame
data = {'Name': ['Rohit', 'Veer'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)

 

mport pandas as pd

# Creating a simple DataFrame
data = {'Name': ['Rohit', 'Veer'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)

Output:

  Name  Age
0  Rohit  25
1    Veer  30

Here, pandas helps create a DataFrame to organize data in a tabular format, making it easier to process.

2. What are the primary data structures in pandas?

How to Answer:
This pandas interview question tests your knowledge of the core components of pandas. Explain that pandas mainly uses two data structures: Series for one-dimensional labeled data, and DataFrame for two-dimensional tabular data. Also, mention that these structures are built on top of NumPy arrays, which makes operations fast and reliable.

Sample Answer:
Pandas provides two primary data structures:

  • Series: A one-dimensional labeled array that can hold any data type.
  • DataFrame: A two-dimensional table with labeled rows and columns, similar to an Excel sheet or SQL table.

These structures support powerful indexing and are built to integrate well with NumPy.

Example:

import pandas as pd

# Creating a Series
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print("Series:\n", s)

# Creating a DataFrame
data = {'Name': ['Rohit', 'Veer'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print("\nDataFrame:\n", df)

Output:

Series:
a    10
b    20
c    30
dtype: int64

DataFrame:
   Name  Age
0  Rohit   25
1    Veer   30

Here, the Series holds a one-dimensional list with labels, while the DataFrame organizes data in a table with rows and columns.

Also Read: Mastering Pandas: Important Pandas Functions For Your Next Project

3. How do you create a Series in pandas?

How to Answer:
Explain that a Series in pandas is a one-dimensional labeled array capable of holding any data type. You can create it using lists, dictionaries, or NumPy arrays, and highlight how you can also set custom indices.

Sample Answer:
A Series in pandas is a one-dimensional data structure like a column in a spreadsheet. It can be created from a Python list, dictionary, or NumPy array, with optional custom labels for the index.

Example:

import pandas as pd

# Creating a Series from a list with custom index
temps = pd.Series([72, 75, 78], index=['Monday', 'Tuesday', 'Wednesday'])
print(temps)

Output:

Monday       72
Tuesday      75
Wednesday    78
dtype: int64

Build strong data handling skills with the upGrad’s Master’s in Data Science. Work on real data projects, sharpen your Python and Pandas expertise, and step confidently into data roles. 

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

background

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

 

upGrad’s Exclusive Data Science Webinar for you –

How to Build Digital & Data Mindset

 

 

4. How do you create a DataFrame in pandas?

How to Answer:
Explain that a DataFrame is a two-dimensional labeled data structure similar to a table. Mention that it can be created from dictionaries, lists of dictionaries, NumPy arrays, or by reading external files.

Sample Answer:
A DataFrame in pandas is a two-dimensional data structure with rows and columns. It’s typically created from a dictionary of lists or other data structures, making it ideal for representing tabular data.

Example:

import pandas as pd

data = {'Name': ['Rohit', 'Veer'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)

Output:

  Name  Age
0  Rohit   25
1    Veer   30

Also Read: A Comprehensive Guide to Pandas DataFrame astype()

5. How do you read data from a CSV file into a DataFrame?

How to Answer:
Say you use the read_csv() function, which is the most common way to load data into pandas. You can specify delimiters, handle headers, and deal with missing values while reading.

Sample Answer:
You can load data from a CSV file into a DataFrame using the pd.read_csv() function. This helps bring external datasets into pandas for analysis and cleaning.

Example:

import pandas as pd

df = pd.read_csv('sales_data.csv')
print(df.head())

Also Read: Pandas vs NumPy in Data Science: Top 15 Differences

6. How can you view the first few rows of a DataFrame?

How to Answer:
Say you use the head() method to look at the top rows of a DataFrame. You can specify how many rows to display, which is helpful for quick inspection.

Sample Answer:
Use the head() method to see the first few rows of your DataFrame. It’s useful to confirm that your data loaded correctly or to understand its structure quickly.

Example:

print(df.head(3))  # Displays first 3 rows

7. How do you check the data types of columns in a DataFrame?

How to Answer:
Explain that each column in a DataFrame has a data type, and you can check them using the dtypes attribute to verify or troubleshoot your dataset.

Sample Answer:
Use the dtypes attribute on a DataFrame to see the data type of each column. It’s helpful for spotting unexpected types before analysis.

Example:

print(df.dtypes)

Learn how to check column data types in a DataFrame using Pandas, and pick up countless other practical tricks through upGrad’s Executive Diploma in Data Science & AI, complete with hands-on projects and career support.

8. How do you select a single column or multiple columns in pandas?

How to Answer:
Say that you can use square brackets to select a single column by name or a list of column names for multiple columns.

Sample Answer:
Select one column with df['column'] or multiple columns by passing a list like df[['col1', 'col2']]. You can also use .loc and .iloc for more complex selections.

Example:

single = df['Name']
multiple = df[['Name', 'Age']]
print(single)
print(multiple)

9. What is the difference between loc and iloc in pandas?

How to Answer:
Clarify that loc selects by labels (like row or column names), while iloc selects by integer index positions.

Sample Answer:
loc is used for label-based indexing, so you select data by row and column names. iloc uses integer positions to access data by index numbers.

Example:

print(df.loc[0, 'Name'])  # By label
print(df.iloc[0, 0])      # By position

10. How do you add a new column to an existing DataFrame?

How to Answer:
Say that you can create a new column by direct assignment using square brackets, or by applying a function to existing columns.

Sample Answer:
You can add a new column by assigning a list or computed values to a new column name, or by using apply() for operations based on existing columns.

Example:

df['Salary'] = [50000, 60000]
print(df)

Also Read: Python for Data Science Cheat Sheet: Pandas, NumPy, Matplotlib & Key Functions

11. How do you delete a column or row in a DataFrame?

How to Answer:
Explain that you use the drop() method, specifying axis=1 for columns and axis=0 for rows.

Sample Answer:
Use drop() to remove rows or columns from a DataFrame. This helps in cleaning up unnecessary data.

Example:

df = df.drop('Salary', axis=1)  # Delete column
df = df.drop(0, axis=0)         # Delete row
print(df)

Also Read: Data Analysis Using Python [Everything You Need to Know]

12. How do you handle missing data in pandas?

How to Answer:
Mention common methods like dropna() to remove, fillna() to replace, and isna() to detect missing values.

Sample Answer:
You handle missing data using methods like dropna() to remove them or fillna() to replace them. Use isna() to check where data is missing.

Example:

df.fillna(0, inplace=True)
print(df)

13. How do you rename columns in a DataFrame?

How to Answer:
Say that you use the rename() method with the columns parameter or set_axis() to rename columns.

Sample Answer:
Rename columns using rename() by passing a dictionary that maps old names to new names. set_axis() can also be used to set all column names at once.

Example:

df.rename(columns={'Name': 'CustomerName'}, inplace=True)

14. What is reindexing in pandas, and how is it used?

How to Answer:
Explain that reindexing means changing the order or labels of rows or columns, often to align data with another dataset.

Sample Answer:
Use the reindex() method to rearrange or change the index labels of your DataFrame. This helps align datasets.

Example:

new_index = [1, 0]
df = df.reindex(new_index)
print(df)

15. How do you sort data in a DataFrame by a specific column?

How to Answer:
Say that you use the sort_values() method to sort by column values in ascending or descending order.

Sample Answer:
Sort data using sort_values(by='ColumnName'). This orders the rows based on that column, either in ascending or descending order.

Example:

df = df.sort_values(by='Age', ascending=True)
print(df)

Intermediate Pandas Interview Questions

With the basics out of the way, it’s time to raise the bar. This section covers python pandas interview questions that focus on intermediate concepts like indexing, grouping, merging, and transforming data. These are essential for anyone with experience working with Python pandas in real-world scenarios.

Now, dive into these important questions to expand your skill set.

16. What is a pandas Index, and how does it work?

How to answer:
Explain that an Index uniquely identifies rows (or columns) in Series/DataFrames. It enables fast lookups, alignment, and supports multi-level indexing.

Sample Answer:
An Index in pandas is an immutable array-like object that labels axes. It powers fast data selection, alignment, and reshaping.

Example:

import pandas as pd
df = pd.DataFrame({'Score': [90, 85]}, index=['Rohit', 'Veer'])
print(df.index)

Output:

Index(['Rohit', 'Veer'], dtype='object')

17. How do you set or reset the index of a DataFrame?

How to answer:
Show how set_index() changes the row labels to a column, and reset_index() reverts them back to a default integer index.

Sample Answer:
Use set_index() to make a column the index, and reset_index() to bring it back as a column.

Example:

df = pd.DataFrame({'Name': ['Rohit', 'Veer'], 'Score': [90, 85]})
df.set_index('Name', inplace=True)
print(df)
df.reset_index(inplace=True)
print(df)

Output:

    Score
Name        
Rohit     90
Veer       85

   Name  Score
0  Rohit     90
1    Veer     85

18. What is multi-indexing in pandas, and how do you create it?

How to answer:
Explain it’s hierarchical indexing that lets you work with multiple index levels, which is useful for complex datasets.

Sample Answer:
Multi-indexing allows multiple levels of indexing for rows or columns. You can create it via set_index() with multiple columns.

Example:

arrays = [('Math', 'Math', 'Sci', 'Sci'), (2020, 2021, 2020, 2021)]
index = pd.MultiIndex.from_tuples(arrays, names=['Subject', 'Year'])
df = pd.DataFrame([88, 92, 79, 85], index=index, columns=['Score'])
print(df)

Output:

  Score
Subject Year       
Math    2020     88
       2021     92
Sci     2020     79
       2021     85

19. How do you filter rows based on a condition in pandas?

How to answer:
State that you use boolean indexing or query() to filter rows by logical conditions.

Sample Answer:
Use conditions like df[df['Col'] > value] or query() to filter data based on rules.

Example:

df = pd.DataFrame({'Name': ['Rohit', 'Veer'], 'Score': [90, 80]})
high_scores = df[df['Score'] > 85]
print(high_scores)

Output:

  Name  Score
0  Rohit     90

Subscribe to upGrad's Newsletter

Join thousands of learners who receive useful tips

Promise we won't spam!

Boost your Pandas expertise and prepare for your next interview with confidence. upGrad’s Executive Post Graduate Certificate Programme in Data Science & AI, in partnership with IIIT Bangalore, builds solid skills in Python, Machine Learning, Big Data, and more, perfect for sharpening your data analysis with Pandas.

20. How do you handle duplicate data in a DataFrame?

How to answer:
Mention using duplicated() to detect and drop_duplicates() to remove duplicates.

Sample Answer:
Use duplicated() to check for duplicates and drop_duplicates() to clean them.

Example:

df = pd.DataFrame({'Name': ['Rohit', 'Veer', 'Rohit'], 'Score': [90, 85, 90]})
df_cleaned = df.drop_duplicates()
print(df_cleaned)

Output:

  Name  Score
0  Rohit     90
1    Veer     85

21. How do you group data using the groupby() function?

How to answer:
Describe grouping by one or more columns and applying aggregate functions.

Sample Answer:
groupby() groups rows by a key and applies functions like sum() or mean().

Example:

df = pd.DataFrame({'Class': ['A', 'A', 'B'], 'Score': [85, 90, 88]})
avg_scores = df.groupby('Class').mean()
print(avg_scores)

Output:

      Score
Class       
A        87.5
B        88.0

22. What are pivot tables in pandas, and how do you create them?

How to answer:
State that they reshape data for summaries by aggregating values across multiple dimensions.

Sample Answer:
Use pivot_table() to summarize data by index/columns with an aggregation function.

Example:

df = pd.DataFrame({'Class': ['A', 'A', 'B'], 'Subject': ['Math', 'Sci', 'Math'], 'Score': [85, 90, 88]})
pivot = df.pivot_table(values='Score', index='Class', columns='Subject', aggfunc='mean')
print(pivot)

Output:

Subject   Math   Sci
Class              
A         85.0  90.0
B         88.0   NaN

23. How do you concatenate or append DataFrames?

How to answer:
Say you use concat() for combining along an axis, or append() for simpler row addition.

Sample Answer:
Use concat() for merging along rows or columns; append() adds rows to an existing DataFrame.

Example:

df1 = pd.DataFrame({'Name': ['Rohit'], 'Score': [90]})
df2 = pd.DataFrame({'Name': ['Veer'], 'Score': [85]})
combined = pd.concat([df1, df2])
print(combined)

Output:

  Name  Score
0  Rohit     90
0    Veer     85

24. What is the difference between merge() and join() in pandas?

How to answer:
Explain that merge() aligns by columns, similar to SQL joins, whereas join() aligns by index.

Sample Answer:
merge() joins on columns explicitly (like SQL), join() joins on the index by default.

Example:

df1 = pd.DataFrame({'ID': [1, 2], 'Score': [90, 85]})
df2 = pd.DataFrame({'ID': [1, 2], 'Grade': ['A', 'B']})
merged = pd.merge(df1, df2, on='ID')
print(merged)

Output:

  ID  Score Grade
0   1     90     A
1   2     85     B

25. How do you perform different types of joins (inner, outer, left, right) in pandas?

How to answer:
Show using merge() with the how parameter to specify the join type.

Sample Answer:
Use merge(how='inner'), left, right, or outer to control join behavior.

Example:

df1 = pd.DataFrame({'ID': [1, 2], 'Score': [90, 85]})
df2 = pd.DataFrame({'ID': [2, 3], 'Grade': ['B', 'C']})
joined = pd.merge(df1, df2, on='ID', how='outer')
print(joined)

Output:

  ID  Score Grade
0   1   90.0   NaN
1   2   85.0     B
2   3    NaN     C

26. How do you apply a function to every element in a DataFrame using applymap()?

How to answer:
Point out that applymap() is for element-wise operations on entire DataFrames.

Sample Answer:
Use applymap() to apply a function to each cell, typically for transformations.

Example:

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
squared = df.applymap(lambda x: x**2)
print(squared)

Output:

A   B
0  1   9
1  4  16

27. What is the difference between apply(), map(), and applymap()?

How to answer:
Outline that map() is for Series element-wise, apply() for rows/columns, applymap() for all DataFrame elements.

Sample Answer:

  • map() works on Series elements,
  • apply() on rows/columns or a Series,
  • applymap() on each DataFrame element.

Example:

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
print(df['A'].map(lambda x: x*10))
print(df.apply(sum))
print(df.applymap(str))

Output:

0    10
1    20
Name: A, dtype: int64
A    3
B    7
dtype: int64
 A  B
0  1  3
1  2  4

28. How do you handle categorical data in pandas?

How to answer:
Show using astype('category') or pd.Categorical() for efficient storage and operations.

Sample Answer:
Convert text columns to category dtype to save memory and optimize operations.

Example:

df = pd.DataFrame({'Grade': ['A', 'B', 'A']})
df['Grade'] = df['Grade'].astype('category')
print(df['Grade'].dtypes)

29. How do you perform one-hot encoding using pandas?

How to answer:
Explain using pd.get_dummies() to create binary columns for each category.

Sample Answer:
Use get_dummies() to convert categorical columns into multiple binary columns.

Example:

df = pd.DataFrame({'Grade': ['A', 'B', 'A']})
encoded = pd.get_dummies(df['Grade'])
print(encoded)

Output:

A  B
0  1  0
1  0  1
2  1  0

Advanced Pandas Interview Questions

With intermediate concepts mastered, it's time to tackle the more challenging aspects. These pandas interview questions are designed for experienced professionals and dive deep into optimization, time series data, advanced transformations, and file handling. 

This section provides comprehensive answers, accompanied by practical examples and efficient code snippets, to solidify your expertise.

Get ready to sharpen your skills further with these advanced topics.

30. How do you change the data type of a column in a DataFrame?

How to answer:
State you use astype() to convert to int, float, str, or category.

Sample Answer:
Change types with astype(), e.g. convert strings to integers or floats.

Example:

df = pd.DataFrame({'Value': ['1', '2']})
df['Value'] = df['Value'].astype(int)
print(df.dtypes)

Output:

Value    int64
dtype: object

31. How do you optimize pandas performance with large datasets?

How to answer:
Explain that you optimize pandas performance by minimizing memory usage, using vectorized operations, and processing data in chunks. Emphasize using appropriate data types and filtering early.

Sample answer:
You can optimize pandas by converting columns to more efficient data types, reading and processing data in chunks with chunksize, and leveraging NumPy’s vectorized operations. Always clean and filter data early to reduce unnecessary computations.

Example:

df = pd.read_csv('large_file.csv', dtype={'id': 'int32'})
for chunk in pd.read_csv('large_file.csv', chunksize=10000):
    process(chunk)

Output:
Processes data in smaller pieces, reducing memory overhead and improving speed.

32. What is the SettingWithCopyWarning in pandas, and how can you avoid it?

How to answer:
Mention that it warns about modifying a view instead of the original DataFrame. Highlight avoiding chained indexing, using .loc[], and making explicit copies.

Sample answer:
SettingWithCopyWarning occurs when pandas is unsure if you’re modifying a view or a copy. To avoid it, use .loc[] for assignments and call .copy() when working with slices.

Example:

df_slice = df.loc[df['col'] > 0].copy()
df_slice['col2'] = 5

Output:
Data is modified without ambiguity or warnings.

33. How do you work with time series data in pandas?

How to answer:
Explain converting date columns with pd.to_datetime(), setting them as index, and using pandas’ time-based operations like resample(). Also, mention adjusting time zones if needed.

Sample answer:
You work with time series by parsing dates using pd.to_datetime(), setting the datetime column as the index, and applying functions like resample() to aggregate data. Handle time zones when necessary to keep data consistent.

Example:

df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
monthly = df.resample('M').mean()

Output:
Aggregates data to monthly averages based on the datetime index.

34. How do you resample time series data in pandas?

How to answer:
Say you use resample() to change the frequency of time series data and then apply aggregation methods like sum or mean.

Sample answer:
Use resample() to group data into new time intervals, such as daily or monthly, and apply aggregations like sum or mean. This allows analysis at different granularities.

Example:

daily_sales = df.resample('D').sum()

Output:
Creates a DataFrame with sales summed for each day.

35. What is the rolling() function, and how do you use it?

How to answer:
Explain that it computes rolling statistics over a moving window. Mention applying functions like mean or sum to smooth data.

Sample answer:
Use rolling() to calculate moving averages or sums over a specified window of observations. This is especially useful for smoothing fluctuations in time series data.

Example:

df['7d_avg'] = df['sales'].rolling(window=7).mean()

Output:
Adds a new column with the 7-day moving average of sales.

36. How do you interpolate missing data in a DataFrame?

How to answer:
Explain how to use interpolate() to estimate and fill missing values, with options such as linear or time-based interpolation.

Sample answer:
Handle missing data with interpolate(), which estimates missing values based on surrounding data. You can specify methods such as linear, polynomial, or time, depending on the characteristics of the data.

Example:

df['temp'] = df['temp'].interpolate(method='linear')

Output:
Fills missing temperature values based on a linear trend.

37. What is the difference between fillna() and interpolate() methods?

How to answer:
Highlight that fillna() fills missing data with static values or nearby observations, while interpolate() estimates values based on data trends.

Sample answer:
fillna() replaces missing values with constants or uses forward/backward filling without considering trends. interpolate() estimates missing values by looking at existing data patterns, which is better for continuous or time series data.

Example:

df['col'].fillna(0)
df['col'].interpolate(method='linear')

Output:
fillna() replaces NaNs with 0; interpolate() fills based on a linear interpolation of existing data.

38. How do you use pivot() and melt() functions in pandas?

How to answer:
Explain that pivot() reshapes data from long to wide by spreading unique values across columns. Then note melt() does the reverse, turning wide data into a long format for analysis or visualization.

Sample answer:
You use pivot() to transform unique values into columns for comparison, and melt() to unpivot wide data back to long form. This is essential for preparing data for stats or plots.

Example:

wide_df = df.pivot(index='date', columns='product', values='sales')
long_df = wide_df.reset_index().melt(id_vars='date', var_name='product', value_name='sales')

Output:
Reshapes data from long to wide and then back to long format.

39. How do you perform vectorized operations in pandas?

How to answer:
Say you perform operations directly on Series or DataFrames without loops. This approach is faster and takes advantage of pandas’ optimized computation.

Sample answer:
Use vectorized operations to apply calculations across entire columns at once. This is cleaner and more efficient than iterating through rows one by one.

Example:

df['revenue'] = df['price'] * df['quantity']

Output:
Creates a revenue column by multiplying the columns element-wise.

40. How do you read data from SQL databases using pandas?

How to answer:
Mention connecting with libraries like sqlite3 or SQLAlchemy and using read_sql() to load results straight into a DataFrame.

Sample answer:
You connect to a database and use pd.read_sql() or read_sql_query() to pull query results into pandas. This streamlines working with relational data.

Example:

import sqlite3
conn = sqlite3.connect('db.sqlite3')
df = pd.read_sql('SELECT * FROM sales', conn)

Output:
Loads SQL table data directly into a DataFrame.

41. How do you merge multiple DataFrames on a key column?

How to answer:
Explain using merge() to join DataFrames like SQL joins, specifying keys and join type with how.

Sample answer:
Use merge() to combine DataFrames on shared keys. Control the join style by selecting inner, left, right, or outer merges to suit the data relationship.

Example:

merged_df = pd.merge(df1, df2, on='customer_id', how='left')

Output:
Joins data, keeping all customers from df1 and matching info from df2.

42. How do you concatenate DataFrames vertically?

How to answer:
Mention using concat() with axis=0 to stack DataFrames on top of each other. It requires consistent columns across DataFrames.

Sample answer:
Call pd.concat() along axis=0 to append rows from multiple DataFrames into one. This is helpful when combining monthly or partitioned data files.

Example:

merged_df = pd.merge(df1, df2, on='customer_id', how='left')

Output:
Creates a single DataFrame with all rows from January to March.

43. What does the groupby() function do in pandas?

How to answer:
The groupby() function splits data into groups based on key columns, allowing you to apply aggregation or transformation functions.

Sample answer:
Use groupby() to organize data into groups by key(s), then apply functions like sum() or mean() to summarize each group. It works similarly to SQL GROUP BY.

Example:

summary = df.groupby('region')['sales'].sum()

Output:
Shows total sales aggregated by each region.

44. How do you apply a custom function to rows or columns?

How to answer:
Explain using apply() to run a function across columns (axis=0) or rows (axis=1), allowing for complex, flexible operations.

Sample answer:
Call apply() with axis=1 to process row-wise or axis=0 for column-wise logic. This is useful for creating derived metrics or applying conditional calculations.

Example:

df['discount'] = df.apply(lambda row: row['price']*0.1 if row['category']=='A' else 0, axis=1)

Output:
Adds a discount column calculated based on category.

Strengthen your analytical skills with upGrad’s Introduction to Data Analysis using Excel program. Learn to clean, analyze, and visualize data confidently with pivot tables, formulas, and more. Enroll today and start building your data expertise in just 9 hours.

45. How do you create dummy variables for categorical data?

How to answer:
Mention using get_dummies() to one-hot encode categorical columns, which prepares data for machine learning models.

Sample answer:
Use pd.get_dummies() to convert categories into separate binary columns. This lets algorithms handle non-numeric data effectively.

Example:

encoded = pd.get_dummies(df, columns=['gender', 'city'])

Output:
Generates binary columns, such as gender_male and city_NY, etc.

Pandas Coding Interview Questions

Ready to flex those coding muscles? This section dives into practical challenges that often appear in pandas interview questions. You’ll learn how to handle real-world scenarios like data transformations, outlier removal, and SQL-like operations in Python pandas. 

These tasks not only test your technical skills but also your problem-solving approach. Now, it’s time to tackle these coding scenarios one by one.

46. How do you sort a DataFrame by multiple columns?

How to answer:
Say you use sort_values() with a list of columns and can control sort direction for each.

Sample answer:
Call sort_values() with multiple columns to sort hierarchically. Pass a list to ascending to fine-tune the sort order at each level.

Example:

df.sort_values(by=['region', 'sales'], ascending=[True, False])

Output:
Sorts data by region alphabetically, then by sales in descending order.

47. How do you reset the index of a DataFrame?

How to answer:
Explain that reset_index() moves the index back into a column and resets to the default integer index.

Sample answer:
Use reset_index() to turn the existing index into a column. With drop=True, it removes the old index entirely and creates a new sequential index.

Example:

df_reset = df.reset_index(drop=True)

Output:
DataFrame indexed from 0, dropping the old index.

48. How do you rename columns in pandas?

How to answer:
Say you use rename() with a dictionary mapping old names to new ones.

Sample answer:
Call rename(columns={old: new}) to update specific column names without altering the rest of the DataFrame.

Example:

df.rename(columns={'old_col': 'new_col'}, inplace=True)

Output:
Renames old_col to new_col.

49. How do you drop duplicate rows?

How to answer:
Explain using drop_duplicates() to remove repeated rows, optionally checking only specific columns.

Sample answer:
Use drop_duplicates() to keep only the first occurrence of duplicates. You can pass a subset to compare on selected columns.

Example:

df_unique = df.drop_duplicates(subset=['customer_id'])

Output:
Ensures each customer_id appears only once.

50. How do you filter rows based on multiple conditions?

How to answer:
Mention combining conditions with & for AND and | for OR, and always wrap conditions in parentheses.

Sample answer:
Build complex filters by combining multiple boolean conditions with & (and) or | (or), using parentheses to ensure correct logical grouping.

Example:

filtered = df[(df['sales'] > 1000) & (df['region'] == 'North')]

Output:
Filters data to include only high sales in the North region.

51. How do you drop rows or columns from a DataFrame?

How to answer:
Say you use drop() with axis=0 for rows (default) and axis=1 for columns. You can drop by index or by column names.

Sample answer:
You can drop() rows by index labels or columns by names. Setting axis=1 specifies columns, while axis=0 or the default drops rows.

Example:

df = df.drop(columns=['unnecessary_col'])
df = df.drop(index=[0, 1])

Output:
DataFrame without the specified column or rows.

52. How do you fill missing values with different values for each column?

How to answer:
Mention passing a dictionary to fillna() that maps each column to its fill value. This way, each column gets a tailored replacement.

Sample answer:
Use fillna() with a dictionary to fill different columns with specific values, like median for age and a string for city.

Example:

df.fillna({'age': df['age'].median(), 'city': 'Unknown'}, inplace=True)

Output:
Fills missing age with median and city with 'Unknown'.

Advance your programming skills with 50-hours of upGrad’s Data Structures & Algorithms course. Master algorithm analysis, sorting, and key data structures through hands-on learning. Enroll now to start building a strong foundation for your tech career.

53. How do you create a new column based on conditions?

How to answer:
Explain using np.where() for fast vectorized if-else or apply() for more complex row-wise logic.

Sample answer:
Use np.where() for quick conditional columns or apply() when you need detailed checks across multiple columns.

Example:

import numpy as np
df['high_value'] = np.where(df['sales'] > 5000, 'Yes', 'No')

Output:
Adds high_value column with 'Yes' or 'No'.

54. How do you check the correlation between numerical columns?

How to answer:
Say you use corr() to compute correlation coefficients, typically Pearson by default, across all numeric columns.

Sample answer:
Call df.corr() to obtain a correlation matrix that shows the strength of relationships between numeric columns, which is helpful for feature analysis.

Example:

correlation_matrix = df.corr()

Output:
Matrix with pairwise correlations between numeric columns.

55. How do you read only specific columns from a CSV?

How to answer:
Mention using the usecols parameter in read_csv() to limit which columns are loaded, reducing memory usage.

Sample answer:
Use pd.read_csv() with usecols to read only needed columns from a CSV, speeding up load time and saving memory.

Example:

df = pd.read_csv('data.csv', usecols=['id', 'name', 'sales'])

Output:
A DataFrame with just the id, name, and sales columns.

56. How do you iterate over rows in pandas?

How to answer:
Explain how to use iterrows() to retrieve each row as a Series, or itertuples() for faster iteration as namedtuples.

Sample answer:
Use iterrows() when you need rows as Series with labels, or itertuples() when performance matters for looping over large data.

Example:

for index, row in df.iterrows():
    print(row['name'], row['sales'])

Output:
Prints the name and sales for each row.

57. How do you query a DataFrame using a string expression?

How to answer:
Say you use query() for filtering with a string expression, which is more readable and often faster than boolean indexing.

Sample answer:
Call query() to filter DataFrames using string conditions. It simplifies complex filters and handles multiple conditions cleanly.

Example:

for index, row in df.iterrows():
    print(row['name'], row['sales'])

Output:
Returns rows where sales exceed 1000 in the North region.

58. How do you change the data type of a column?

How to answer:
Mention using astype() to cast columns to a new type, whether converting to int, float, string, or category.

Sample answer:
Use astype() to explicitly change a column’s data type, which is important for memory optimization or preparing data for models.

Example:

df['customer_id'] = df['customer_id'].astype(int)

Output:
Change customer_id to an integer type.

59. How do you get the cumulative sum of a column?

How to answer:
Say you use cumsum() to calculate running totals across a Series or DataFrame column.

Sample answer:
Call cumsum() to compute a cumulative sum, which builds a running total often used for financial or sequential analyses.

Example:

df['running_total'] = df['sales'].cumsum()

Output:
Adds a running_total column showing progressive sums.

60. How do you rank data within groups?

How to answer:
Explain combining groupby() with rank() to rank values within each subgroup, which is helpful for competitions or sales tiers.

Sample answer:
Use groupby() on a category and then rank() to assign rankings within each group, such as identifying top sales by region.

Example:

df['rank'] = df.groupby('region')['sales'].rank(ascending=False)

Output:
Ranks sales within each region, with the highest sales ranked 1.

Pandas Interview Question and Answer Tips

When preparing for a pandas interview questions, it’s important to build a strong understanding of both fundamental pandas operations and more advanced data analysis techniques. A key area to focus on is data manipulation, as pandas questions frequently test your ability to clean, transform, and explore data using efficient methods.

Being clear on best practices, knowing how to handle large datasets, and avoiding common mistakes will help you stand out. Below are some focused tips to guide your preparation.

Tip

Explanation

Read the official documentation Review the pandas docs thoroughly. Review examples and parameter options to gain a deep understanding.
Solve coding problems often Use platforms like LeetCode or HackerRank to practice pandas problems regularly. This builds speed and accuracy.
Build data projects Try analyzing public datasets from Kaggle or similar sites. Use pandas for cleaning, transformation, and summary statistics.
Know how pandas works with other libraries Practice using pandas with NumPy, Matplotlib, and scikit-learn. Many pandas interview questions will involve more than one library.
Keep up with pandas updates Check release notes to learn about new methods or improvements that could simplify your solutions.
Explain what you’ve learned to someone else Teaching a concept or writing it out often reveals what you don’t fully understand yet.
Measure performance and memory use Use pandas tools like info() and memory_usage() to see how your code affects resources. Try optimizing by selecting the right data types or using vectorized operations.

This kind of preparation will help you handle a wide range of pandas interview questions confidently and show that you can use pandas effectively in practical work.

How Can upGrad Help You Develop Relevant Pandas Skills?

Pandas interview questions often focus on your ability to handle data cleaning, transformation, and analysis tasks. Employers look for skills like working with data frames, merging datasets, handling missing values, grouping data, and managing time series. 

It also helps to know how pandas works alongside libraries such as NumPy, scikit-learn, and Matplotlib, since many data problems require more than just pandas. upGrad’s courses help you build data science skills through hands-on projects and real-world datasets. You’ll practice data handling, analysis, and machine learning tasks, so you’re ready to tackle the same types of problems asked in interviews and faced on the job.

Here are some of the top upGrad courses  (including free ones)  to support your Pandas development journey:

For personalized career guidance, reach out to upGrad’s counselors or visit a nearby upGrad career center. With expert counseling and a curriculum designed around data analysis and pandas, you’ll be ready to handle pandas interview questions and grow your skills for data-driven roles.

Reference:
https://www.datacamp.com/report/data-ai-literacy-report-2025

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Frequently Asked Questions (FAQs)

1. What are some common data cleaning tasks performed using pandas?

Data cleaning with pandas often includes handling missing values, correcting data types, renaming columns, and removing duplicates. You might also normalize text, filter outliers, or apply transformations to align data formats. Understanding how to use functions like fillna(), dropna(), astype(), and drop_duplicates() is essential. These tasks help prepare your dataset for accurate analysis.

2. How can you merge or join datasets in pandas?

pandas provides powerful tools for combining datasets through functions like merge(), join(), and concat(). You can perform inner, left, right, or outer joins based on keys or indexes. Knowing when to use each method depends on your data structure and the relationships you want to preserve. This is widely used in preparing data for multi-source analysis.

3. What is the difference between loc and iloc in pandas?

loc is label-based indexing, meaning it uses the row and column labels to access data. iloc is integer position-based, using numerical indices to locate rows and columns. This distinction is crucial for writing clear, bug-free code, especially when working with datasets where labels are not sequential numbers.

4. How can pandas handle time series data?

pandas offers specialized functionality for time series, such as DatetimeIndex, resample(), and rolling windows. You can parse dates during CSV reading, perform time-based indexing, and run operations like moving averages. These capabilities make pandas well-suited for analyzing trends over time or preparing data for forecasting.

5. What are some strategies to improve pandas performance on large datasets?

To handle large data efficiently, consider loading only needed columns, using category data types for repetitive text, or chunk processing with read_csv(). Profiling with df.info() and memory usage checks helps optimize your approach. Libraries like Dask can also scale pandas-like operations across larger datasets.

6. How can you export pandas data to different file formats?

pandas can export dataframes to multiple formats such as CSV (to_csv()), Excel (to_excel()), JSON (to_json()), and SQL databases (to_sql()). Understanding these export options allows you to integrate pandas workflows with other tools or store processed data for reporting and collaboration.

7. What is the use of groupby() in pandas?

groupby() helps split data into groups based on column values, then apply aggregate functions like sum(), mean(), or custom operations. This helps summarize data, create pivot-like tables, or analyzing patterns across categories. Knowing how to chain groupby() with aggregation functions is important for any data analysis task.

8. How do you handle duplicate data in pandas?

Use duplicated() to identify duplicate rows and drop_duplicates() to remove them. You can specify subsets of columns to check for duplicates and decide whether to keep the first, last, or none. Handling duplicates properly ensures data accuracy and avoids skewed analyses.

9. How can you apply custom functions to pandas columns?

You can use apply(), map(), or applymap() to run custom functions on series or dataframes. This is useful for creating new calculated columns or cleaning text data. Understanding these methods gives flexibility to tailor data transformation beyond built-in functions.

10. How does pandas handle categorical data?

pandas provides the Categorical type to represent text values that repeat often, which saves memory and speeds up operations. This is especially useful in large datasets with many repeated string labels, such as country codes or product categories. You can also order categories for sorting or analysis.

11. How can you visualize data directly with pandas?

pandas integrates with Matplotlib to let you create plots directly from dataframes using the plot() method. You can make line charts, bar charts, histograms, and more without extensive setup. This feature is helpful for quick exploratory analysis or verifying patterns before building more detailed visualizations.

Rohit Sharma

834 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in DS & AI

360° Career Support

Executive PG Program

12 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree

17 Months

upGrad Logo

Certification

3 Months