Normalization is a systematic process of ensuring that a relational database model is efficient, suitable for general-purpose querying and free of undesirable characteristics such as insertion, update, and deletion anomalies, leading to losing the integrity of the data. This normalization process also helps to eliminate data redundancy and reduces the chances of inconsistency after any insert, update, or delete operations.
For a better understanding, consider the following schema: Student (Name, Address, Subject, Grade)
Check out our free courses to get an edge over the competition.
There are a few problems or inefficiencies in this schema.
1) Redundancy: The student’s Address is repeated for each subject he is registered for.
2) Updating anomaly: We may have updated the Address in one tuple (row) while leaving it unchanged in the other rows. Thus we would not have a consistently unique address for each student.
3) Insertion Anomaly: We will not record a student’s Address without registering for at least one Subject. Similarly, when a student wants to enrol for a new Subject, it’s possible that a different Address to be inserted.
4)Â Deletion Anomaly: If a student decides to discontinue all the enrolled subjects, then the student’s address will also be lost in the process of deletion.
Thus, it is important to represent the user data by relations that do not create anomalies following tuple add, delete, or update operations. This can only be achieved by a careful analysis of the integrity constraints, especially the database’s data dependencies.
The relations should be designed so that only those attributes are grouped that exist naturally together. This can mostly be done by a basic understanding of the meaning of all data attributes. However, we still need some formal measure to ensure our design goal.
Explore our Popular Software Engineering Courses
Check out upGrad’s Java Bootcamp
Normalization is that formal measure. It answers the question of why a particular grouping of attributes will be better than any other.
Seven normal forms exist as of today:
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce-Codd Normal Form (BCNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
- Sixth or Domain-key Normal form (6NF)
Read: Types of Views in SQL
First Normal Form (1NF or Minimal Form)
- There’s no top-to-bottom ordering to the rows and left-to-right ordering to the columns.
- There are no duplicate rows.
- Every row-and-column intersection contains exactly one value from the applicable domain or null value. This condition indicates that all column values should be atomic, scalar, or holding only a single value. No repetition of information or values in multiple columns is allowed here.
- All columns are regular (i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps).
Check out upGrad’s Full Stack Development Bootcamp (JS/MERN)
Let’s take an example of a schema that is not normalized. Suppose a designer wishes to record the names and telephone numbers of customers. They define a customer table as shown:
Customer ID | First Name | Surname | Telephone Numbers |
123 | Bimal | Saha | 555-861-2025 |
456 | Kapil | Khanna | 555-403-1659, 555-776-4100 |
789 | Kabita | Roy | 555-808-9633 |
Here, it is not in 1 NF. The Telephone Numbers column is not atomic or doesn’t have a scalar value, i.e. it has had more than one value, which can’t be allowed in 1 NF.
In-Demand Software Development Skills
To Make It 1 NF
- We’ll first break (decompose) our single table into two.
- Each table should have information about only one entity.
Customer ID | First Name | Surname |
123 | Bimal | Saha |
456 | Kapil | Khanna |
789 | Kabita | Roy |
Customer ID | Telephone Numbers |
123 | 555-861-2025 |
456 | 555-403-1659 |
456 | 555-776-4100 |
789 | 555-808-9633 |
Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone Number link appears on its own record.
Checkout:Â Most Common SQL Interview Questions & Answers
Second Normal Form
Each normal form has more constraining criteria than its predecessor. So any table that is in second normal form (2NF) or higher is, by definition, also in 1NF. On the other hand, a table that is in 1NF may or may not be in 2NF; if it is in 2NF, it may or may not be in 3NF, and so on.
A 1NF table is said to be in 2NF if and only if none of its nonprime attributes is functionally dependent on a part (proper subset) of a candidate key. (A nonprime attribute does not belong to any candidate key.)
Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF.
Explore Our Software Development Free Courses
upGrad’s Exclusive Software Development Webinar for you –
SAAS Business – What is So Different?
Check If a Relation R (A, B, C, D, E) with FD Set as { BC ? D, AC ? BE, B ? E } is in 2NF?
- As we can see, the closure of AC is (AC)+ = {A, C, B, E, D} by applying the membership algorithm. But none of its subsets can determine all attribute of relation by themselves, so AC is the candidate key for this relation. Moreover, neither A nor C can be derived from any other attribute of the relation, so there will be only 1 candidate key which is {AC}.
- Here {A, C} are the prime attributes and {B, D, E} are the nonprime attributes.
- The relation R is already in 1st normal form as a relational DBMS in 1NF does not allow multi-valued or composite attribute.
BC ? D is in 2nd normal form because BC is not a proper subset of candidate key AC,
AC ? BE is in 2nd normal form as AC itself is the candidate key, and
B ? E is in 2nd normal form B is not a proper subset of candidate key AC.
Thus the given relation R is in the 2nd Normal Form.
Third Normal Form
A table is said to be in 3NF if and only if for each of its functional dependencies.
X → A, at least one of the following conditions holds:
- X contains A (that is, X → A is a trivial functional dependency), or
- X is a super key, or
- A is a prime attribute (i.e., A is present within a candidate key)
Another definition of 3NFstates that every non-key attribute of R is non-transitively dependent (i.e. directly dependent) on the primary key of R. This means no nonprime attribute (not part of candidate key) is functionally dependent on other nonprime attributes. If there are two dependencies such that A ? B and BC, then from these FDs, we may derive A ? C. This dependence A-C is transitive.
Example of 3NF:
Consider the following relation Order (Order#, Part, Supplier, UnitPrice, QtyOrdered) with the given set of FDs:
Order# ? Part, Supplier, QtyOrdered  and Supplier, Part ? UnitPrice)
Here Order# is key to the relation.
Using Amstrong’s axioms, we get
Order# ? Part, Order ? Supplier, and Order ? QtyOrdered.
Order# ? Part, Supplier and Supplier, Part ? Unit Price, both give Order# ? UnitPrice.
Thus, we see that all nonprime attributes are depending on the key (Order#). However, there exists a transitive dependency between Order# and UnitPrice. So this relation is not in 3NF. How do we make it in 3NF?
We cannot store the UnitPrice of any Part supplied by any Supplier unless someone places an order for that Part. So we will have to decompose the table to make it follow 3NF as follows.
Order (Order#, Part, Supplier, QtyOrdered) and Price Master (Part, Supplier, UnitPrice).
Now there are no transitive dependencies present. The relation is in 3NF.
Also Read:Â SQL for Data ScienceÂ
Learn Software Courses online from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.
Conclusion
There’s more to normalization, like BCNF, 4NF, 5NF and 6NF. In short, BCNF is nothing but an extension of 3NF, as the last rule of 3NF doesn’t apply here. All functional dependencies need to have the key attributes on the left and none on the right-hand side. (BCNF is also called 3.5NF). However, normal forms from 4NF and beyond are scarcely implemented in regular practice.
If you’re interested to learn more about full-stack development, check out upGrad & IIIT-B’s Executive PG Program in Full-stack Software Development, which is designed for working professionals and offers 500+ hours of rigorous training, 9+ projects, and assignments, IIIT-B Alumni status, practical hands-on capstone projects & job assistance with top firms.
What is database normalization?
What are the different types of normal forms?
The normal forms were developed by Edgar F. Codd, the father of relational databases. Each normal form is a level of the overall logical correctness of the relational model and serves a purpose in the actual design of databases. The first normal form, 1NF, is all about table design, and involves removing duplicates and ensuring that every piece of data is represented only once in the table. The second normal form is about duplicable columns - breaking them down into multiple tables. The third normal form is about repeating groups - breaking them down into multiple tables. The fourth normal form is about 1NF, 2NF, & 3NF - ensuring that the tables are free from any logical or de-normalization.
How to normalize a database?
Normalizing a database is the process of breaking it down into the smallest number of tables. In the end, the database will have no repeating fields and no rows with partial information. The purpose is to ensure that all data is linked to all other relevant data, and when a change occurs in one record, all other records that may be related to it are changed as well.