Businesses in the Age of Big Data are overwhelmed by large volumes of data on a day-to-day basis. However, it is not the sheer amount of relevant data but what is done with the data that matters. Hence, Big Data needs to be analyzed to gain insights that will ultimately dictate better decisions and influence strategic business moves.
Still, it is not enough to analyze data and leave it there. The next step is data visualization that presents the data in a visual format to see and understand patterns, trends, and outliers in data. Heatmap in Python is one of the many data visualization techniques.
Data visualization refers to the graphical representation of data and may include graphs, charts, maps, and other visual elements. It is highly critical for analyzing humongous amounts of information and making data-driven decisions.
This article will walk you through the concept of a heatmap in Python and how to create one using Seaborn.
What is a Heatmap?
A heatmap in Python is a data visualization technique where colours represent how a value of interest changes with the values of two other variables. It is a two-dimensional graphical representation of data with values encoded in colours, thereby giving a simplified, insightful, and visually appealing view of information. The image below is a simplified representation of a heatmap.
Typically, a heatmap is a data table with rows and columns representing different sets of categories. Each cell in the table contains a logical or numerical value that determines the colour of the cell based on a given colour palette. Thus, heat maps use colours to emphasize the relationship between data values that would be otherwise challenging to understand if arranged in a regular table using raw numbers.
Heatmaps find applications in several real-world scenarios. For instance, consider the heat map below. It is a stock index heatmap that identifies prevailing trends in the stock market. The heatmap uses a cold-to-hot colour scheme to show which stocks are bearish and which are bullish. The former is represented using the colour red, while the latter is depicted in green.
Heatmaps find use in several other areas. Some examples include website heatmaps, geographical heatmaps, and sports heatmaps. For instance, you could use a heatmap to understand how rainfall varies according to the month of the year across a set of cities. Heatmaps also come extremely handy to study human behaviour.
A correlation heatmap is a two-dimensional matrix showing the correlation between two distinct variables. The rows of the table show the values of the first variable, whereas the second variable appears as the columns. Like a regular heatmap, a correlation heatmap also comes with a colour bar to read and understand the data.
The colour scheme used is such that one end of the colour scheme represents the low-value data points and the other end the high-value data points. Hence, correlation heatmaps are ideal for data analysis since they present patterns in an easily readable form while also highlighting the variation in the data.
Given below is a classic representation of a correlation heatmap.
Creating a Seaborn Heatmap in Python
Seaborn is a Python library used for data visualization and is based on matplotlib. It provides an informative and visually attractive medium to present data in a statistical graph format. In a heatmap created using seaborn, a colour palette portrays the variation in related data. If you are a beginner and would like to gain expertise in data science, check out our data science courses.
Steps to Create a heatmap in Python
The following steps give a rough outline of how to create a simple heatmap in Python:
- Import all the required packages
- Import the file where you have stored your data
- Plot the heatmap
- Display the heatmap using matplotlib
Now, let us show you how seaborn, along with matplotlib and pandas, can be used to generate a heatmap.
In this example, we will construct a seaborn heatmap in Python for 30 pharmaceutical company stocks. The resulting heatmap will show the stock symbols and their respective single-day percentage price change. We will begin by collecting the market data on pharma stocks and create a CSV (Comma-separated Value) file consisting of the stock symbols and their corresponding percentage price change in the first two columns of the said CSV file.
Since we are working with 30 pharma companies, we will construct a heatmap matrix comprising 6 rows and 5 columns. In addition, we want the heatmap to depict the percentage price change in descending order. So, we will arrange the stocks in the CSV file in descending order and add two more columns to indicate the position of each stock on the X and Y axes of the seaborn heatmap.
Explore our Popular Data Science Certifications
Step 1: Importing the Python packages.
Step 2: Loading the dataset.
The dataset is read using the read_csv function from pandas. Further, we use the print statement to visualise the first 10 rows.
Step 3: Creating a Python Numpy array.
Keeping the 6 x 5 matrix in mind, we will create an n-dimensional array for the “Symbol” and “Change” columns.
Step 4: Creating a pivot in Python.
From the given data frame object “df,” the pivot function creates a new derived table. The pivot function takes three arguments – index, columns, and values. The values of the cells of the new table are taken from the “Change” column.
Top Data Science Skills to Learn
Step 5: Creating an array to annotate the heatmap.
The next step is to create an array for annotating the seaborn heatmap. For this, we will call the flatten method on the arrays “percentage” and “symbol” to flatten a Python list of lists in one line. Further, the zip function zips a list in Python. We will run a Python for loop and use the format function to format the stock symbols and percentage price change values as needed.
Read our popular Data Science Articles
upGrad’s Exclusive Data Science Webinar for you –
Transformation & Opportunities in Analytics & Insights
Step 6: Creating the matplotlib figure and defining the plot.
In this step, we will create an empty matplotlib plot and define the figure’s size. In addition, we will add the title of the plot, set the font size of the title, and fix its distance from the plot by using the set_position method. Finally, since we only want to display the stock symbols and their corresponding single-day percentage price change, we will hide the ticks for the X and Y axes and remove the axes from the plot.
Step 7: Creating the heatmap
In the last step, we will use the heatmap function from the seaborn Python package to create the heatmap. The heatmap function of the seaborn Python package takes the following set of arguments:
It is a two-dimensional dataset that can be coerced into an array. Given a Pandas DataFrame, the rows and columns will be labeled using the index/column information.
It is an array of the same shape as the data and annotates the heatmap.
It is a matplotlib object or colourmap name and maps the data values to the colour space.
It is a string formatting code used when adding annotations.
It sets the width of the lines that divide each cell.
The final output of the seaborn heatmap for the chosen pharma companies will look like this:
Way Forward: Learn Python with upGrad’s Professional Certificate Program in Data Science
The Professional Certificate Program in Data Science for Business Decision Making is a rigorous, 8-months online program focusing on data science and machine learning concepts with particular emphasis on their real-world business applications. The program is categorically designed for managers and working professionals who want to develop the practical knowledge and skills of data science that will help them take strategic and data-driven business decisions.
Here are some course highlights:
- Prestigious recognition from IIM Kozhikode
- 200+ hours of content
- 3 industry projects and a capstone
- 20+ live learning sessions
- 5+ expert coaching sessions
- Coverage of Excel, Tableau, Python, R, and Power BI
- One-on-one with industry mentors
- 360-degree career support
- Job assistance with top firms
Sign up with upGrad and hone your Python heatmap skills for all your data visualisation needs!
Statisticians and data analysts use a plethora of tools and techniques to sort the collated data and present them in an easily understandable and user-friendly manner. In this regard, heatmaps as a data visualization technique have helped businesses across all sectors to visualize and understand data better.
To sum up, heatmaps have been used widely and are still used as one of the statistical and analytical tools of choice. This is because they offer a visually appealing and accessible mode of data presentation, are readily understandable, versatile, adaptable, and do away with the tedious steps of traditional data analysis and interpretation processes by presenting all the values in a single frame.
How do you plot a heatmap?
A heatmap is a standard way to plot grouped data in a two-dimensional graphical format. The basic idea behind plotting a heatmap is that the graph is divided into squares or rectangles, each representing one cell on the data table, one dataset, and one row. The square or rectangle is colour-coded according to the value of that cell in the table.
Does a heatmap show correlation?
A correlation heatmap is a graphical representation of a correlation matrix depicting the correlation between different variables. Correlation heatmaps are very effective if used properly since highly correlated variables can be easily identified.
Why seaborn is used in Python?
Seaborn is an open-source Python library based on matplotlib. It is used for exploratory data analysis and visualization and easily works with data frames and the Pandas library. Plus, the graphs created using seaborn are easily customisable.