Python can do many things with data. And one of its many capabilities is visualization. It has multiple libraries that you can use for this purpose. In this article, we’ll take a look at some of its prominent libraries and the various graphs you can plot through them.
Python is a popular choice among data scientists, analysts, and researchers due to its extensive libraries and tools for data visualization. Data visualization is visually portraying data and information to assist people in acquiring insights, uncovering trends, and effectively conveying discoveries. Python data visualization capabilities are mostly driven by libraries like Matplotlib, Seaborn, Plotly, and Bokeh, each with features and functions.
Python Data Visualization
We have shared multiple examples in this article, be sure to try them out by using a dataset. Let’s get started:
Python Data Visualization Libraries
Python has many libraries to create beautiful graphs. They all have various features that enhance their performance and capabilities. And they are available for all skill levels. This means you can perform data visualization in Python, whether you’re a beginner or an advanced programmer. The following are some prominent libraries:
- Seaborn
- Matplotlib
- Pandas
There are many other python libraries for data science, but we’ve focused on the prominent ones for the time being. We’ll now discuss these different libraries and understand how you can plot graphs by using them and Python. Let’s get started.
Check out our data science certifications to upskill yourself
Matplotlib
Matplotlib is a versatile and adaptable Python data visualization library that allows for creating simple visualizations like bar charts, histograms, line charts, and scatter plots. Because of its comprehensive functionality, customization possibilities, and interoperability with other python visualization libraries, it is a popular choice for making high-quality plots and charts in various disciplines, such as scientific research, data analysis, and data exploration.
The most popular Python library for plotting graphs is Matplotlib. It doesn’t require much experience, and for beginners, it’s perfect. You can start learning data visualization through this library and master a variety of graphs and visualizations. It gives you a lot of freedom, but you’d have to write a lot of code too.
People use Matplotlib for simple visualizations such as bar charts and histograms.
Read: Data Frames in Python
Line Chart
To create a line chart, you’d need to use the ‘plot’ method. By looping the columns, you can create multiple columns in your graph. Use the following code for this purpose:
# get columns to plot
columns = iris.columns.drop([‘class’])
# create x data
x_data = range(0, iris.shape[0])
# create figure and axis
fig, ax = plt.subplots()
# plot each column
for column in columns:
ax.plot(x_data, iris[column], label=column)
# set title and legend
ax.set_title(‘Iris Dataset’)
ax.legend()
Top Essential Data Science Skills to Learn
Scatter Plot
You can create a scatter plot using the ‘scatter’ method. You should create an axis and a figure through ‘plt.subplots’ to give your plot labels and a title.
Use the following code:
# create a figure and axis
fig, ax = plt.subplots()
# scatter the sepal_length against the sepal_width
ax.scatter(iris[‘sepal_length’], iris[‘sepal_width’])
# set a title and labels
ax.set_title(‘Iris Dataset’)
ax.set_xlabel(‘sepal_length’)
ax.set_ylabel(‘sepal_width’)
You can add color to the data points according to their classes. For this purpose, you’ll need to make a dictionary that would map from class to color. It’d scatter each point by using a for-loop as well.
# create color dictionary
colors = {‘Iris-setosa’:’r’, ‘Iris-versicolor’:’g’, ‘Iris-virginica’:’b’}
# create a figure and axis
fig, ax = plt.subplots()
# plot each data-point
for i in range(len(iris[‘sepal_length’])):
ax.scatter(iris[‘sepal_length’][i], iris[‘sepal_width’][i],color=colors[iris[‘class’][i]])
# set a title and labels
ax.set_title(‘Iris Dataset’)
ax.set_xlabel(‘sepal_length’)
ax.set_ylabel(‘sepal_width’)
Our learners also read: Free Online Python Course for Beginners
Histogram
You can use the ‘hist’ method to create a Histogram in Matplotlib. It can calculate how frequently every class occurs if we give it categorical data. Here’s the code you’d need to use to plot a Histogram in Matplotlib:
# create figure and axis
fig, ax = plt.subplots()
# plot histogram
ax.hist(wine_reviews[‘points’])
# set title and labels
ax.set_title(‘Wine Review Scores’)
ax.set_xlabel(‘Points’)
ax.set_ylabel(‘Frequency’)
Bar Chart
Matplotlib has easy methods for plotting different graphs. For example, in this case, to create a bar chart in Matplotlib, you’ll need to use ‘bar.’ It can’t calculate the frequency of categories automatically, so you’ll need to use the ‘value_counts’ function to solve this issue. If your data doesn’t have many types, then the bar chart would be perfect for its visualization.
# create a figure and axis
fig, ax = plt.subplots()
# count the occurrence of each class
data = wine_reviews[‘points’].value_counts()
# get x and y data
points = data.index
frequency = data.values
# create bar chart
ax.bar(points, frequency)
# set title and labels
ax.set_title(‘Wine Review Scores’)
ax.set_xlabel(‘Points’)
ax.set_ylabel(‘Frequency’)
Explore our Popular Data Science Degrees
Pandas
It includes data structures and methods that enable dealing with structured data more efficiently and straightforwardly, such as tabular data. Because of its extensive capabilities and ease of use, Pandas is frequently used in data science, machine learning, and data analytics. Pandas uses less code than Matplotlib to make bar charts, line charts, scatter plots, and histograms.
Pandas is a Python library that’s popular for data analysis and manipulation. It’s an open-source library so you can use it for free. It entered the market in 2008, and since then, it has become one of the most popular libraries for data structuring.
By using the pandas data frame, you can easily create plots for your data. Its API is more advanced than Matplotlib. This means you can create graphs with less code in Pandas than you would in Matplotlib.
Bar Chart
In Pandas, you’ll need to use the ‘plot.bar()’ method to plot a bar chart. First, you’ll need to count the occurences in your plot through ‘value_count()’ and then sort them with ‘sort_index()’. Here’s an example code to create a bar chart:
random_reviews[‘points’].value_counts().sort_index().plot.bar()
You can use the ‘plot.barh()’ method to create a horizontal bar chart in Pandas:
random_reviews[‘points’].value_counts().sort_index().plot.barh()
You can plot the data through the number of occurrences as well:
random_reviews.groupby(“country”).price.mean().sort_values(ascending=False)[:5].plot.bar()
Line Chart
You’ll need to use ‘<dataframe>.plot.line()’ to create a line chart in Pandas, In Pandas, you wouldn’t need to loop through every column you need to plot as it can do so automatically. This feature isn’t available in Matplotlib. Here’s the code:
random.drop([‘class’], axis=1).plot.line(title=’Random Dataset’)
Read our popular Data Science Articles
Scatter Plot
You can create a scatter plot in Pandas by using “<dataset>.plot.scatter()”. You’d need to pass it two arguments, which are, names of the x-column and the y-column.
Here’s its example:
random.plot.scatter(x=’sepal_length’, y=’sepal_width’, title=”Random Dataset’)
upGrad’s Exclusive Data Science Webinar for you –
Transformation & Opportunities in Analytics & Insights
Histogram
Use ‘plot.hist’ to create a Histogram in Pandas. Apart from that, there isn’t much in this method. You have the option to create a single Histogram or multiple Histograms.
To create one Histogram, use the following code:
random_reviews[‘points’].plot.hist()
To create multiple Histograms, use this:
random.plot.hist(subplots=True, layout=(2,2), figsize=(10, 10), bins=20)
Seaborn
Seaborn is a powerful Python data visualization library framework that excels at effortlessly producing aesthetically appealing and useful charts. It has a high-level interface that makes it easier to create aesthetically beautiful charts. Seaborn, with its emphasis on aesthetics and specialized features, is a powerful tool for analyzing and successfully expressing data patterns and relationships. Seaborn expands Matplotlib’s capabilities and adds new capability for statistical and categorical visualizations by expanding on existing features. Seaborn is an excellent alternative for data scientists, analysts, and researchers looking to generate aesthetically appealing and meaningful visualizations due to its user-friendly interface and expanded functionality.
Seaborn is based on Matplotlib and is also a quite popular Python library for data visualization. It gives you advanced interfaces to plot your data. It has many features. Its advanced capabilities allow you to create great graphs with far fewer lines of code than you’d need with Matplotlib.
Line Chart
You can use the ‘sns.line plot’ method to create a line chart in Seaborn. You can use the ‘sns.kdeplot’ method to round the edges of the lines’ curves. It keeps your plot quite clean if it has a lot of outliers.
sns.lineplot(data=random.drop([‘class’], axis=1))
Scatter Plot
In Seaborn, you can create a scatter plot through the ‘.scatterplot’ method. You’ll need to add the names of the x and y columns in this case, just like we did with Pandas. But there’s a difference. We can’t call the function on the data as we did in Pandas, so we’ll need to pass it as an additional argument.
sns.scatterplot(x='sepal_length', y='sepal_width', data=iris)
By using the ‘hue’ argument, you can highlight specific points as well. This feature isn’t this easy in Matplotlib.
sns.scatterplot(x='sepal_length', y='sepal_width', hue='class', data=iris)
Bar Chart
You can use the ‘sns.countplot’ method to create a bar chart in Seaborn:
sns.countplot(random_reviews[‘points’])
Now that we’ve discussed the critical libraries for data visualization in Python, we can take a look at other forms of graphs. Python and its libraries enable you to create various kinds of figures to plot your data.
Other Kinds of Data Visualization in Python
Python and its libraries also enable various types of visualizations, such as pie charts and box plots, which are useful for representing categorical data or displaying statistical information. Pie charts are excellent for demonstrating the distribution or proportion of different categories within a dataset, with each category represented by a slice of a circle. Box plots, on the other hand, give a brief overview of a dataset’s statistical distribution by presenting key variables such as the minimum, maximum, median, and quartiles. These additional visualization choices broaden the breadth of Python tools accessible, allowing users to better present information and obtain deeper insights into their data.
Pie Chart
Pie charts show data in different sections of a circle. You must’ve seen plenty of pie charts in school. Pie charts represent data in percentages. The total sum of all the segments of a pie chart should be equal to 100%. Here is the example code:
plt.pie(df['Age'], labels = {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J"}, autopct ='% 1.1f %%', shadow = True) plt.show() plt.pie(df['Income'], labels = {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J"}, autopct ='% 1.1f %%', shadow = True) plt.show() plt.pie(df['Sales'], labels = {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J"}, autopct ='% 1.1f %%', shadow = True) plt.show()
Box Plots
Box plots are based on the minimum, first quartile, median, third quartile, and a maximum of the statistical data. The graph looks like a box (more specifically, a rectangle). That’s why it has the name ‘box plot.’ Here’s example code for creating a box plot graph:
# For each numeric attribute of data frame
df.plot.box()
# individual attribute box plot
plt.boxplot(df[‘Income’])
plt.show()
Also read: Top 10 Python Tools Every Python Developer Should Know
Conclusion
We hope you found this article useful. There are many kinds of graphs you can plot through Python and its various libraries. If you haven’t performed Python data visualization before, you should start with Matplotlib. After mastering it, you can move onto more complex and advanced data visualization libraries such as Pandas and Seaborn.
Python provides a strong set of data visualization capabilities that are essential for obtaining insights, revealing trends, and effectively presenting findings. Python is an excellent resource for programmers of all skill levels because of these capabilities. Python’s wide library set provides users with the tools they need according to their individual data visualization requirements. Python enables you to improve your data visualization abilities and present information with clarity and precision, whether you are a novice or an experienced programmer.
If you are curious to learn about python, data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.