Pandas is the favourite library for any Data Science enthusiast. It caters to all the needs of processing the Data via the structured tabular format, date-time formats, and providing the matplotlib API to instantly perform plotting within the pandas chaining operations. You can load Data from websites directly into data frames. This library also comes in very handy while performing exploratory data analysis that reveals insights about the dataset and various distributions it aligns with.
As more and more tools are built to enhance Data exploration, Pandas GUI is one of them that uses pandas as the core component and displays a windowed GUI with a lot of additional functions that are usually performed manually.
Let’s explore this utility and look at some of the best features.
Best Features of Python GUI
1. Basic Setup
It is a python package and therefore can be easily installed via PyPI using pip which is a Python package manager. The installation command for this will be:
pip install pandasgui
All the dependencies such as Pyqt, Plotly will be installed via this command. After the installation is completed, you need to import two modules that include pandas and one function from pandasgui.
import pandas as pd
from pandasgui import show
The show function is the main entry point of the GUI display. It takes in the dataset for which you want to perform analysis as the pandas’ data frame object. This package comes with preloaded datasets to test out its functions. Some of the datasets included in this are iris, titanic, pokemon, car crashes, mpg, stock data, tips, mi_manufacturing, gapminder. For illustration purposes, we will pick the tips dataset. To load this dataset,
from pandasgui.datasets import tips
Now the last step of the code is to call the show function and use the GUI utility:
GUI = show(tips)
As soon as you run this, an application will prompt with data filled in tabular format and some overhead tabs. See the image below (All the images presented in this article are provided by the Author):
2. Various On-Screen Functions
Before exploring the various tabs of the program, let’s discuss some of the key on-screen functions:
- If you click on any column header (total_bill, day…) of the dataset, the data will be sorted according to ascending order of that particular column, clicking again will sort it in descending order and the next click will reset the sorting. In this way, you can sort your data easily. Here, we have sorted the data in descending order of size:
- You can add multiple CSVs in this GUI simply by drag and drop. All the files will be listed on the left panel that makes it super easy to switch between them
- If you click on any cell in the data, you get the option to directly edit the values. This is something similar to what excel sheets offer and that makes pandas GUI useful.
- You can select any section of the data by selecting all the required cells by holding the left click and hovering the mouse. The selected cells will be highlighted with blue color and this selection can be copied as it is. You can paste this section into excel sheets or notepads!
The first tab after the data frame is the filer that allows filtration of data based on conditions defined here. It uses the underlying pandas’ data frame query() function. This makes it possible to filter out a particular section of the dataset required by the user. To access it, simply click on the filters tab, and after that create a filter corresponding to your dataset. For example, we can apply:
sex == ‘Female’ , day == ‘Fri’ and time == ‘Lunch’
The resultant dataset looks like this:
Before proceeding to the advanced analysis, it is a good practice to look at the data types of the features, their count, min-max values, etc. The pandas describe() function provides this summary. In this GUI presentation, the statistics tab does the same job. It displays the data type, count, unique values count, mean, standard deviation, and min-max.
As the name suggests, this tab provides access to plotting different types of graphs that come under data visualization. It is essential to plot our data so that we can uncover facts that can prove fruitful in the upcoming analysis and can be helpful to decide which features we want to select for our model training. Pandas GUI supports histogram, scatter, line, bar, box, violin, heatmap, pie, and even word cloud.
Configuring a plot in this GUI is a straightforward drag and drop columns. Suppose you want to plot a scatter plot for total bill and tip given concerning time. Simply click on Grapher, select scatter plot, and drag the total bill into x on the immediate right of the column names section, and then click finish to render the plot
All the plots generated by this tab are interactive because they are built using the Plotly library.
This tab offers two functionalities: pivot table and melt. A pivot table is an important and powerful feature of statistics that lets users convert the column with multiple values into their own columns. The melt functionality is the reverse of pivoting. It allows columns to be converted into single rows. Both of these functions come in handy when you want to summarize the data.
The pandas offer separate functions for both and the GUI offers drag and drop of columns to passed as index, columns, values in case of pivot and id_vars and value_vars in case of melt.
Pandas GUI is a great project that allows users to process the dataset visually without any core coding. The modified dataset can be exported from the top menu edit option. The project lacks a lot more features such as regular expressions search, filling null values that may be integrated into future versions of this project but being open source, it is still a very great tool. If you are looking for an industry-ready tool then you can try Google DataFlow.
If you are curious to learn about python, data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.