Programs

Top Python Data Visualization Libraries You Should Know About

Python can do many things with data. And one of its many capabilities is visualization. It has multiple libraries that you can use for this purpose. In this article, we’ll take a look at some of its prominent libraries and the various graphs you can plot through them.

Python Data Visualization

We have shared multiple examples in this article, be sure to try them out by using a dataset. Let’s get started:

Python Data Visualization Libraries

Python has many libraries to create beautiful graphs. They all have various features that enhance their performance and capabilities. And they are available for all skill levels. This means you can perform data visualization in Python, whether you’re a beginner or an advanced programmer. The following are some prominent libraries:

  • Seaborn
  • Matplotlib
  • Pandas

There are many other python libraries for data science, but we’ve focused on the prominent ones for the time being. We’ll now discuss these different libraries and understand how you can plot graphs by using them and Python. Let’s get started.

Check out our data science certifications to upskill yourself

Matplotlib

The most popular Python library for plotting graphs is Matplotlib. It doesn’t require much experience, and for beginners, it’s perfect. You can start learning data visualization through this library and master a variety of graphs and visualizations. It gives you a lot of freedom, but you’d have to write a lot of code too.

People use Matplotlib for simple visualizations such as bar charts and histograms. 

Read: Data Frames in Python

Line Chart

To create a line chart, you’d need to use the ‘plot’ method. By looping the columns, you can create multiple columns in your graph. Use the following code for this purpose:

# get columns to plot

columns = iris.columns.drop([‘class’])

# create x data

x_data = range(0, iris.shape[0])

# create figure and axis

fig, ax = plt.subplots()

# plot each column

for column in columns:

 ax.plot(x_data, iris[column], label=column)

# set title and legend

ax.set_title(‘Iris Dataset’)

ax.legend()

Top Essential Data Science Skills to Learn in 2022

Scatter Plot

You can create a scatter plot using the ‘scatter’ method. You should create an axis and a figure through ‘plt.subplots’ to give your plot labels and a title. 

Use the following code:

# create a figure and axis

fig, ax = plt.subplots()

# scatter the sepal_length against the sepal_width

ax.scatter(iris[‘sepal_length’], iris[‘sepal_width’])

# set a title and labels

ax.set_title(‘Iris Dataset’)

ax.set_xlabel(‘sepal_length’)

ax.set_ylabel(‘sepal_width’)

You can add color to the data points according to their classes. For this purpose, you’ll need to make a dictionary that would map from class to color. It’d scatter each point by using a for-loop as well. 

# create color dictionary

colors = {‘Iris-setosa’:’r’, ‘Iris-versicolor’:’g’, ‘Iris-virginica’:’b’}

# create a figure and axis

fig, ax = plt.subplots()

# plot each data-point

for i in range(len(iris[‘sepal_length’])):

 ax.scatter(iris[‘sepal_length’][i], iris[‘sepal_width’][i],color=colors[iris[‘class’][i]])

# set a title and labels

ax.set_title(‘Iris Dataset’)

ax.set_xlabel(‘sepal_length’)

ax.set_ylabel(‘sepal_width’)

Our learners also read: Free Online Python Course for Beginners

Histogram

You can use the ‘hist’ method to create a Histogram in Matplotlib. It can calculate how frequently every class occurs if we give it categorical data. Here’s the code you’d need to use to plot a Histogram in Matplotlib:

# create figure and axis

fig, ax = plt.subplots()

# plot histogram

ax.hist(wine_reviews[‘points’])

# set title and labels

ax.set_title(‘Wine Review Scores’)

ax.set_xlabel(‘Points’)

ax.set_ylabel(‘Frequency’)

Bar Chart

Matplotlib has easy methods for plotting different graphs. For example, in this case, to create a bar chart in Matplotlib, you’ll need to use ‘bar.’ It can’t calculate the frequency of categories automatically, so you’ll need to use the ‘value_counts’ function to solve this issue. If your data doesn’t have many types, then the bar chart would be perfect for its visualization. 

# create a figure and axis

fig, ax = plt.subplots()

# count the occurrence of each class

data = wine_reviews[‘points’].value_counts()

# get x and y data

points = data.index

frequency = data.values

# create bar chart

ax.bar(points, frequency)

# set title and labels

ax.set_title(‘Wine Review Scores’)

ax.set_xlabel(‘Points’)

ax.set_ylabel(‘Frequency’)

Explore our Popular Data Science Degrees

Pandas

Pandas is a Python library that’s popular for data analysis and manipulation. It’s an open-source library so you can use it for free. It entered the market in 2008, and since then, it has become one of the most popular libraries for data structuring. 

By using the pandas data frame, you can easily create plots for your data. Its API is more advanced than Matplotlib. This means you can create graphs with less code in Pandas than you would in Matplotlib.  

Bar Chart

In Pandas, you’ll need to use the ‘plot.bar()’ method to plot a bar chart. First, you’ll need to count the occurences in your plot through ‘value_count()’ and then sort them with ‘sort_index()’. Here’s an example code to create a bar chart:

 random_reviews[‘points’].value_counts().sort_index().plot.bar()

You can use the ‘plot.barh()’ method to create a horizontal bar chart in Pandas:

random_reviews[‘points’].value_counts().sort_index().plot.barh()

You can plot the data through the number of occurrences as well:

random_reviews.groupby(“country”).price.mean().sort_values(ascending=False)[:5].plot.bar()

Line Chart

You’ll need to use ‘<dataframe>.plot.line()’ to create a line chart in Pandas, In Pandas, you wouldn’t need to loop through every column you need to plot as it can do so automatically. This feature isn’t available in Matplotlib. Here’s the code:

random.drop([‘class’], axis=1).plot.line(title=’Random Dataset’)

Read our popular Data Science Articles

Scatter Plot

You can create a scatter plot in Pandas by using “<dataset>.plot.scatter()”. You’d need to pass it two arguments, which are, names of the x-column and the y-column.

Here’s its example:

 random.plot.scatter(x=’sepal_length’, y=’sepal_width’, title=”Random Dataset’)

upGrad’s Exclusive Data Science Webinar for you –

Transformation & Opportunities in Analytics & Insights

Histogram

Use ‘plot.hist’ to create a Histogram in Pandas. Apart from that, there isn’t much in this method. You have the option to create a single Histogram or multiple Histograms. 

To create one Histogram, use the following code:

random_reviews[‘points’].plot.hist()

To create multiple Histograms, use this:

random.plot.hist(subplots=True, layout=(2,2), figsize=(10, 10), bins=20)

Seaborn

Seaborn is based on Matplotlib and is also a quite popular Python library for data visualization. It gives you advanced interfaces to plot your data. It has many features. Its advanced capabilities allow you to create great graphs with far fewer lines of code than you’d need with Matplotlib. 

Histogram

Line Chart

You can use the ‘sns.line plot’ method to create a line chart in Seaborn. You can use the ‘sns.kdeplot’ method to round the edges of the lines’ curves. It keeps your plot quite clean if it has a lot of outliers.

 sns.lineplot(data=random.drop([‘class’], axis=1))

Scatter Plot

In Seaborn, you can create a scatter plot through the ‘.scatterplot’ method. You’ll need to add the names of the x and y columns in this case, just like we did with Pandas. But there’s a difference. We can’t call the function on the data as we did in Pandas, so we’ll need to pass it as an additional argument. 

 sns.scatterplot(x=’sepal_length’, y=’sepal_width’, data=iris)

By using the ‘hue’ argument, you can highlight specific points as well. This feature isn’t this easy in Matplotlib. 

 sns.scatterplot(x=’sepal_length’, y=’sepal_width’, hue=’class’, data=iris)

Bar Chart

You can use the ‘sns.countplot’ method to create a bar chart in Seaborn:

 sns.countplot(random_reviews[‘points’])

Now that we’ve discussed the critical libraries for data visualization in Python, we can take a look at other forms of graphs. Python and its libraries enable you to create various kinds of figures to plot your data. 

Other Kinds of Data Visualization in Python

Pie Chart

Pie charts show data in different sections of a circle. You must’ve seen plenty of pie charts in school. Pie charts represent data in percentages. The total sum of all the segments of a pie chart should be equal to 100%. Here is the example code:

plt.pie(df[‘Age’], labels = {“A”, “B”, “C”,

                             “D”, “E”, “F”,

                             “G”, “H”, “I”, “J”},                         

autopct =’% 1.1f %%’, shadow = True)

plt.show()

plt.pie(df[‘Income’], labels = {“A”, “B”, “C”,

                                “D”, “E”, “F”,

                                “G”, “H”, “I”, “J”},                          

autopct =’% 1.1f %%’, shadow = True)

plt.show()

plt.pie(df[‘Sales’], labels = {“A”, “B”, “C”,

                               “D”, “E”, “F”,

                               “G”, “H”, “I”, “J”},

autopct =’% 1.1f %%’, shadow = True)

plt.show()

Box Plots

Box plots are based on the minimum, first quartile, median, third quartile, and a maximum of the statistical data. The graph looks like a box (more specifically, a rectangle). That’s why it has the name ‘box plot.’ Here’s example code for creating a box plot graph:

# For each numeric attribute of data frame

df.plot.box()

# individual attribute box plot

plt.boxplot(df[‘Income’])

plt.show()

Also read: Top 10 Python Tools Every Python Developer Should Know

Conclusion

We hope you found this article useful. There are many kinds of graphs you can plot through Python and its various libraries. If you haven’t performed Python data visualization before, you should start with Matplotlib. After mastering it, you can move onto more complex and advanced data visualization libraries such as Pandas and Seaborn. 

If you are curious to learn about python, data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.

Which are the best Data Visualization libraries in Python?

Data Visualization is considered to be an extremely important portion of data analysis. This is because there is no better way than understanding several data trends and information in a visual format. If you present your company's data in a written format, people might find it boring. But, if you present the same in a visual format, people are definitely going to pay more attention to it.

To simplify the data visualization process, there are certain libraries in Python to help you out with. You can't say any particular one to be the best because that will completely depend upon the requirements. Some of the best data visualization libraries in Python are matplotlib, plotly, seaborn, GGplot, and altair.

Which is one of the best plotting libraries in Python?

There are plenty of them to make work easier for you when it comes to data visualization and plotting libraries. It has been seen that among all the available libraries, Matplotlib is considered to be a better one by the users.

Matplotlib occupies less space and also has a better run time. Other than that, it also provides an object-oriented API that allows the users to plot graphs in the application itself. Matplotlib also supports plenty of output types, along with it being free and open-source.

Which is the default data visualization library for data scientists?

If you are into data science, then there are high chances that you would have already used the Matplotlib library. It has been seen that beginners to experienced professionals prefer using this library for building complex data visualizations.

The main reason behind its huge consideration is the amount of flexibility it provides to the users as a 2D plotting library. If you have a MATLAB background, you'll be able to notice that the Pyplot interface of Matplotlib is pretty familiar to you. So, you won't need much time to kick off with your first visualization. The user gets to control the entire visualization in Matplotlib from the most granular level.

Want to share this article?

Plan your Data Science Career now.

Leave a comment

Your email address will not be published. Required fields are marked *

Leave a comment

Your email address will not be published. Required fields are marked *

×
Get Free career counselling from upGrad experts!
Book a session with an industry professional today!
No Thanks
Let's do it
Get Free career counselling from upGrad experts!
Book a Session with an industry professional today!
Let's do it
No Thanks