Data analysis is an essential skill to have for young graduates, engineers, and managers in today’s technology-led work environment. In this article, we will cover how to fetch data from a database in python and get you up to speed on some fundamental concepts.
Data Extraction with Python Database
Data extraction entails retrieving data from various sources, and sometimes processing it further, and migrating it to repositories for further analysis. So, some kind of data transformation happens in the process. And python is one of the leading programming languages for such data science tasks. There are about 8.2 million users of this general-purpose and scripting language across the world.
In the following guide, we will discuss extraction methods using PostgreSQL, an open-source relational database system. It provides a ROW_TO_JSON function that returns the result sets as JSON objects, which are surrounded by curly braces {}. JSON data types would help you manipulate query results more conveniently. But before we begin, make sure that you have installed a virtual environment, such as psycopg2-binary.
Our learners also read: Top Python Courses for Free
Python Database Basics
Suppose you have a PostgreSQL database of the American National Football League (NFL). This would include information about the players, coaches, and teams’ tables. Also, note the following details to get clued up about the stored data:
- Players’ data table houses details like athelete_id, which is the primary key, players’ first and last names, jersey numbers, weight (in kg), height (in m), and their country of origin. It also holds the team_id, a foreign key indicating each athletes’ team.
- The data table on coaches has coach_id (primary key), along with the first and last names, and team_id (a foreign key referencing the teams’ table field).
- Finally, there is the teams’ table that describes every football team with a name, conference, their rank, and total wins and losses (bifurcated into ‘home’ and ‘away’). Here, the primary key is team_id, which is referenced in the tables above.
Now that you are familiar with the dataset, let us explore how to write an SQL query to retrieve a list of teams. For example, you need football teams ordered according to their conference and rank. You also want to extract the number of athletes or players in each team along with the names of their coaches. You may also want to know the number of the teams’ wins and losses, both at home and away.
Follow the steps below to start this process:
SELECT
f.name,
f.city,
f.conference,
f.conference_rank,
COUNT(a.player_id) AS number_of_athletes,
CONCAT(c.first_name, ‘ ‘, c.last_name) AS coach,
f.home_wins,
f.away_wins
FROM athletes a, teams f, coaches c
WHERE a.team_id = f.team_id
AND c.team_id = f.team_id
GROUP BY f.name, c.first_name, c.last_name, f.city, f.conference, f.conference_rank, f.home_wins, f.away_wins
ORDER BY f.conference, f.conference_rank
After this, you can warp the query inside the JSON function we mentioned earlier (ROW_TO_JSON). This will save the data to a file called query.sql in your current directory. Now, continue with the steps given below.
Read our popular Data Science Articles
SELECT ROW_TO_JSON(team_info) FROM (
SELECT
f.name,
f.city,
f.conference,
f.conference_rank,
COUNT(a.athelete_id)AS number_of_atheletes,
CONCAT(c.first_name, ‘ ‘, c.last_name) AS coach,
f.home_wins,
f.away_wins
FROM athletes a, teams f, coaches c
WHERE a.team_id = f.team_id
AND c.team_id = f.team_id
GROUP BY f.name, c.first_name, c.last_name, f.city, f.conference, f.conference_rank, f.home_wins, f.away_wins
ORDER BY f.conference, f.conference_rank
) AS team_info
You would observe that each row has the structure of a python dictionary. The keys are just the field names returned by your query.
Moreover, to avoid exposing your environment variables in plain sight, you can apply some changes to your initialization files. Choose any of the following methods, depending on your needs:
- For Windows: Control panel → System → Advanced System Settings → Advanced Tab → Environment variables.
- For a Unix-like environment: Append two lines about your username and password to your initialization file.
With this, you are all set to write python code. At the very outset, we will import some modules and functions to prevent errors. These statements can help you accomplish that:
import os
import psycopg2 as p
from psycopg2 import Error
Then, we will instantiate the connection by loading the contents of query.sql. Open the SQL database file using open and read commands, and connect with the NFL database using the connect function by specifying your database user, password, host, and port number.
Also Read: Python Projects on GitHub
upGrad’s Exclusive Data Science Webinar for you –
Transformation & Opportunities in Analytics & Insights
Explore our Popular Data Science Courses
How to Fetch Data From a Database in Python?
Once you have established the database connection, you can proceed with query execution. You need to use a control structure called ‘cursor’. It is as easy as writing “cursor = conn.cursor()” and subsequently, “cursor.execute(query)”. The result would then contain a list of tuples (one-element) in a dictionary format.
result = cursor.fetchall()
At this stage, you can attempt iterating over the result. You can manipulate the contents as you want, insert or feed them into spreadsheets, HTML tables, etc. Don’t forget to wrap and clean your code while you finish. You can do so with a try-except-block and adding a ‘finally’ sentence.
When you are handling large datasets, relational or otherwise, you feel the need for some basic tools to query the tables, especially when you also want to manipulate the results. Such data transformation is easy to achieve with python.
Therefore, most postgraduate programs of study include the knowledge of these techniques as a part of the curriculum. Some examples include the Associate Diploma in Data Science (IIIT-Bangalore) and Global Master Certificate in Business Analytics (Michigan State University).
Checkout: Python Open Source Project Ideas
Top Data Science Skills to Learn in 2022
SL. No
Top Data Science Skills to Learn in 2022
1
Data Analysis Course
Inferential Statistics Courses
2
Hypothesis Testing Programs
Logistic Regression Courses
3
Linear Regression Courses
Linear Algebra for Analysis
Conclusion
In this python database tutorial, we learned how to connect a relational database, execute queries, and import results. You can do much more with python and adapt your code to do things you desire.
We hope this guide helped you find some clarity and kickstarted your curiosity!
Learn data science courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.
How do you pull data from an API using Python requests?
When you wish to receive data from an API, you must make a request from the server, just like when you interact with conventional websites. We'll need to use the requests package to get data from an API using Python. In Python, Requests is the standard library for making HTTP requests. Because of its abstractions, it's really easy to use, especially when working with APIs.
When we use the requests library to run a request, we get a request object that contains the data we want to extract as well as a requests status code. The status code informs us about the status of the request, and it is part of every request we make. Depending on the information they return, the codes are divided into hundreds of different values.
How to connect SQLite with Python?
a. We must import sqlite3 in order to use SQLite.
b. Then, using the connect method, make a connection and provide the name of the database you would like to access; if a file with that name exists, it will be opened. Python will create a file with the provided name if you don't specify one.
c. Following that, a cursor object is created that may send SQL commands. Cursor is a control structure for traversing and retrieving database records. When dealing with Python, the cursor is really important. The cursor object will be used to execute all commands.
d. Create an object as well as write the SQL statement in it with comments to create a table in the database. Example: - sql_comm = SQL statement.
e. And running the command is a breeze. Execute the cursor method, passing the name of the sql command as an argument. Save a list of commands as the sql_comm variable and run them. After you've completed all of your tasks, save the modifications to the file by committing them, and then disconnect.
Is Python good for databases?
Python is especially well suited for structured tabular data that can be obtained with SQL but then requires additional manipulation that would be difficult to accomplish with SQL alone.
