Pandas Dataframe Astype: Syntax, Data Types, Creating Dataframe

Python is one of the most used languages across various industries for data manipulation and analysis purposes. The biggest reason behind Python’s popularity is its vast set of libraries that makes it simple for developers to maintain and monitor data. One such library written for Python is Pandas. The Pandas library, in particular, is used for manipulating time series and tables.

The Pandas DataFrame.astype() or sometimes also referred to as astype() method is used to cast pandas objects to a dtype.astype() function. It is particularly very useful when we need to convert the data type of one or multiple columns of a table to another.

Syntax of Pandas DataFrame.astype()

Firstly, before discussing the syntax, we need to import the Pandas library, which is done by:

import pandas as pd

The syntax for DataFrame.astype() method is:

DataFrame.astype(dtype, copy=True, errors=’raise’, **kwargs)

Parameters

Description

Default value

dtype

Uses numpy.dtype or the Python type to cast the entire object to the same type. It can alternatively also use {col: dtype, ?} where col is the column label, and dtype will function the same to cast one or more of the DataFrame’s columns to column-specific types.

dtype

copy

Returns a copy when setting to True (setting copy=false can propagate changes in values to other pandas objects).

True

errors

Controls exceptions raising on invalid data for the given dtype.

raise

kwargs

Keyword arguments to pass on to the constructor.

Returns: casted: return similar to the type of caller.

Read: Data Frames in Python

Data Types in Pandas library

Now since Pandas DataFrame.astype() method is about casting and changing data types in tables, let’s look at the data types and their usage in the Pandas library.

1. Object: Used for text or alpha-numeric values.

2. Int64: Used for Integer numbers.

3. Float64: Used for floating-point numbers.

4. Bool: Used for True/False values.

5. Datetime64: Used for date and time values.

6. Timedelta[ns]: Used for differences between two datetimes.

7. Category: Used for a list of text values.

Creating a DataFrame in Pandas library

There are two ways to create a data frame in a pandas object. We can either create a table or insert an existing CSV file. The code to insert an existing file is:

df = pd.read_csv(“file_name.csv”)

Get our free whitepaper!
Data Science in Healthcare
the next biggest thing
Download Now

The syntax to create a new table for the data frame is:

t = {‘col 1’: [1, 2], ‘col 2’: [3, 4]}

df = pd.DataFrame(data=t)

Must Read: Python Panda Tutorial For Beginners

Using Pandas Dataframe.astype() Method

Once we have the table and dataframe inserted into the pandas object, we can start converting the data types of one or more columns of the table. We can check values’ data types before converting them by using the code df.dtypes or df.info(). Both these codes will display the data types of each column of the table. 

Another thing to note is that the DataFrame.astype() method can give an error if the data frame has nan or NA values. So before proceeding, we need to clear all the nan values from the table. The syntax to drop nan or NA values is:

df.dropna(inplace = True)

Converting All the Columns of a Dataframe

Syntax: df.astype(‘data_type’).dtypes

The entire dataframe’s data type will be converted to the value we put into ‘data_type.

Converting Specific Columns of a Dataframe

Syntax: df.astype({“col_name”: ‘data_type’}).dtypes

“col_name” here requires a column name as input. Whatever column name we put in, that column’s data type will be changed to the value we provide in ‘data_type.’

Converting Multiple Columns at a Time

Syntax: df.astype({“col_name”: ‘data_type’, “col_name”: ‘data_type’, “col_name”: ‘data_type’}).dtypes

All we did here was to separate all the columns that we want to convert with a comma. The “col_name” and ‘data_type’ in the syntax requires the same values as required while converting a single column.

Summarizing It

This is how the Pandas DataFrame.astype() method is used. Python is currently one of the most preferred programming languages as it has also placed a foot into Machine Learning and Data Science. If you want to know how Python is being used in these two fields, and how it can help your career in Data Science, you can read all about it in our blog. You can visit Upgrad’s website to get a PG Diploma in Data Science or PG certification in Machine Learning and Deep

Prepare for a Career of the Future

UPGRAD AND IIIT-BANGALORE'S PG DIPLOMA IN DATA SCIENCE
LEARN MORE

Leave a comment

Your email address will not be published. Required fields are marked *

×
Download Whitepaper
Download Whitepaper
By clicking Download Whitepaper, you agree to our terms and conditions and our privacy policy.
Get our free whitepaper!
Data Science in Healthcare
the next biggest thing
Download Now
Aspire to be a Data Scientist
Download syllabus & join our Data Science Program and develop practical knowledge & skills.
Download syllabus
By clicking Download syllabus, I authorize upGrad and its representatives to contact me
via SMS / Email / Phone / WhatsApp / any other modes.
I agree to upGrad terms and conditions and our privacy policy.