Adding New Column To Existing Dataframe In Pandas [2021]

Python, an interpreted, general-purpose, high-level programming language, has recently become a phenomenal computing language due to its vast collection of libraries and easy to implement nature. The popularity of Python took an enormous leap with the implementation of data science and data analytics. There are thousands of libraries that can be integrated with Python to make it work on any vertical efficiently.

Pandas is one such data analytics library designed explicitly for Python to perform data manipulation and data analysis. The Pandas library consists of specific data structures and operations to deal with numerical tables, analyzing data, and work with time series. In this article, you will get to know how to add columns to DataFrame in Pandas that already exists.

Read: Pandas Dataframe Astype

What is DataFrame?

Before knowing about how to add a new column to the existing DataFrame, let us first take a glimpse of DataFrames in Pandas. DataFrame is a mutable data structure in the form of a two-dimensional array that can store heterogeneous values with labeled axes (rows and columns). DataFrame is a data structure where the data remains stored in a logical arrangement of tabular (intersecting rows and columns) fashion. The three major components of a DataFrame are rows, columns, and data. Creating a DataFrame in Python is very easy.

import pandas as pd

l = [‘This’, ‘is’, ‘a’, ‘List’, ‘preparing’, ‘for’, ‘DataFrame’]

datfr = pd.DataFrame(l)

print(datfr)

The above program will create a DataFrame of 7 rows and one column.

existing dataframe in Pandas

How to Add Columns to Existing DataFrames?

There are various ways of adding new columns to a DataFrame in Pandas. We have already gathered an idea of how to create a basic DataFrame using the Pandas library. Let us now prepare an already existing library and work on it.

import pandas as pd

# Define a dictionary containing Professionals’ data

datfr = {‘Name’: [‘Karl’, ‘Gaurav’, ‘Ray’, ‘Mimo’],

‘Height’: [6.2, 5.7, 6.1, 5.9],

‘Designation’: [‘Scientist’, ‘Professor’, ‘Data Analyst’, ‘Security Analyst’]}

df = pd.DataFrame(datfr)

print(df)

Output:

Read: Python Pandas Tutorial

Technique 1: insert() Method 

Now, to add new columns to the existing DataFrame, we have to use the insert() method. Before implementing the insert() method, let us know about its working. The DataFrame.insert() allows adding a column at any position the data analyst wants to. It also accommodates several possibilities for injecting the column values. Programmers can specify the index to inject the column of data at that particular position.

import pandas as pd

# Define a dictionary containing Professionals’ data

datfr = {‘Name’: [‘Karl’, ‘Gaurav’, ‘Ray’, ‘Mimo’],

‘Height’: [6.2, 5.7, 6.1, 5.9],

‘Designation’: [‘Scientist’, ‘Professor’, ‘Data Analyst’, ‘Security Analyst’]}

df = pd.DataFrame(datfr)

df.insert(3, “Age”, [40, 33, 27, 26], True)

print(df)

It will add the ‘Age’ column in the third index position as defined in the insert() method as the first parameter.

Technique 2: assign() Method 

Another method to add a column to DataFrame is using the assign() method of the Pandas library. This method uses a different approach to add a new column to the existing DataFrame. Dataframe.assign() will create a new DataFrame along with a column. Then it will append it to the existing DataFrame.

import pandas as pd

datfr = {‘Name’: [‘Karl’, ‘Gaurav’, ‘Ray’, ‘Mimo’],

‘Height’: [6.2, 5.7, 6.1, 5.9],

‘Designation’: [‘Scientist’, ‘Professor’, ‘Data Analyst’, ‘Security Analyst’]}

 

dfI = pd.DataFrame(datfr)

dfII = dfI.assign(Location = [‘Noida’, ‘Amsterdam’, ‘Cambridge’, ‘Bangaluru’])

print(dfII)

OUTPUT:

Technique 3: Creating New List as Column 

The last method that programmers can use to add a column to DataFrame is by generating a new list as a separate column of data and appending the column to the existing DataFrame.

import pandas as pd

datfr = {‘Name’: [‘Karl’, ‘Gaurav’, ‘Ray’, ‘Mimo’],

‘Height’: [6.2, 5.7, 6.1, 5.9],

‘Designation’: [‘Scientist’, ‘Professor’, ‘Data Analyst’, ‘Security Analyst’]}

df = pd.DataFrame(datfr)

loc = [‘Noida’, ‘Amsterdam’, ‘Cambridge’, ‘Bangaluru’]

df[‘Location’] = loc

print(df)

OUTPUT:

Checkout: Pandas Inteview Questions

Conclusion

Data analysts perform a primary operation for adding an extra set of data in a column-wise form. There are different approaches a data analyst or a programmer can use to add a new column to an existing DataFrame in Pandas. These methods will make programmers handy to add data columns at any point in time while analyzing Pandas data.

If you are curious to learn about DataFrame in Pandas, check out IIIT-B & upGrad’s Executive PG Program in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.

Why is Pandas one of the most preferred libraries to create data frames in Python?

Pandas library is considered to be the best suited for creating data frames as it provides various features that make it efficient to create a data frame. Some of these features are as follows - Pandas provide us with various data frames that not only allow an efficient data representation but also enable us to manipulate it. It provides efficient alignment and indexing features that provide intelligent ways of labelling and organizing the data. Some features of Pandas make the code clean and increase its readability, thus making it more efficient. It can also read multiple file formats. JSON, CSV, HDF5, and Excel are some of the file formats supported by Pandas. The merging of multiple datasets has been a real challenge for many programmers. Pandas overcome this too and merge multiple data sets very efficiently. Pandas also provides access to other important Python libraries like Matplotlib and NumPy which makes it a highly efficient library.

What are the other Python libraries with whom Pandas library works?

Pandas not only works as a central library for creating data frames, but it also works with other libraries and tools of Python to be more efficient. Pandas is built on the NumPy Python package which indicates that most of Pandas library structure is replicated from the NumPy package. Statistical analysis on the data in Pandas library is operated by SciPy, plotting functions on Matplotlib, and machine learning algorithms in Scikit-learn. Jupyter Notebook is a web-based interactive environment that works as an IDE and offers a good environment for Pandas.

Along with insertion, what are the fundamental operations of the Dataframe?

Selecting an index or a column before starting any operation like addition or deletion is important. Once you learn how to access values and select columns from a Data Frame, you can learn to add index, row, or column in a Pandas Dataframe. If the index in the data frame does not come out to be as you desired, you can reset it. For resetting the index, you can use the “reset_index()” function.

Plan Your Data Science Career Today

Leave a comment

Your email address will not be published.

Accelerate Your Career with upGrad

Our Popular Data Science Course

×