Normalization in SQL: 1NF, 2NF, 3NF & BCNF

Normalization is a systematic process of ensuring that a relational database model is efficient, suitable for general-purpose querying and free of undesirable characteristics such as insertion, update, and deletion anomalies, leading to losing the integrity of the data. This normalization process also helps to eliminate data redundancy and reduces the chances of inconsistency after any insert, update, or delete operations.

For a better understanding, consider the following schema: Student (Name, Address, Subject, Grade)

There are a few problems or inefficiencies in this schema.

1) Redundancy: The student’s Address is repeated for each subject he is registered for.

2) Updating anomaly: We may have updated the Address in one tuple (row) while leaving it unchanged in the other rows. Thus we would not have a consistently unique address for each student.

3) Insertion Anomaly: We will not record a student’s Address without registering for at least one Subject. Similarly, when a student wants to enrol for a new Subject, it’s possible that a different Address to be inserted.

4) Deletion Anomaly: If a student decides to discontinue all the enrolled subjects, then the student’s address will also be lost in the process of deletion.

Thus, it is important to represent the user data by relations that do not create anomalies following tuple add, delete, or update operations. This can only be achieved by a careful analysis of the integrity constraints, especially the database’s data dependencies.

The relations should be designed so that only those attributes are grouped that exist naturally together. This can mostly be done by a basic understanding of the meaning of all data attributes. However, we still need some formal measure to ensure our design goal.

Normalization is that formal measure. It answers the question of why a particular grouping of attributes will be better than any other.

Seven normal forms exist as of today:

  • First Normal Form (1NF)
  • Second Normal Form (2NF)
  • Third Normal Form (3NF)
  • Boyce-Codd Normal Form (BCNF)
  • Fourth Normal Form (4NF)
  • Fifth Normal Form (5NF)
  • Sixth or Domain-key Normal form (6NF)

Read: Types of Views in SQL

First Normal Form (1NF or Minimal Form)

  • There’s no top-to-bottom ordering to the rows and left-to-right ordering to the columns.
  • There are no duplicate rows.
  • Every row-and-column intersection contains exactly one value from the applicable domain or null value. This condition indicates that all column values should be atomic, scalar, or holding only a single value. No repetition of information or values in multiple columns is allowed here.
  • All columns are regular (i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps).

Let’s take an example of a schema that is not normalized. Suppose a designer wishes to record the names and telephone numbers of customers. They define a customer table as shown:

Customer ID First Name Surname Telephone Numbers
123 Bimal Saha 555-861-2025
456 Kapil Khanna 555-403-1659, 555-776-4100
789 Kabita Roy 555-808-9633

Here, it is not in 1 NF. The Telephone Numbers column is not atomic or doesn’t have a scalar value, i.e. it has had more than one value, which can’t be allowed in 1 NF.

To Make It 1 NF

  • We’ll first break (decompose) our single table into two.
  • Each table should have information about only one entity.
Customer ID First Name Surname
123 Bimal Saha
456 Kapil Khanna
789 Kabita Roy

 

Customer ID Telephone Numbers
123 555-861-2025
456 555-403-1659
456 555-776-4100
789 555-808-9633

Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone Number link appears on its own record.

Checkout: Most Common SQL Interview Questions & Answers

Second Normal Form

Each normal form has more constraining criteria than its predecessor. So any table that is in second normal form (2NF) or higher is, by definition, also in 1NF. On the other hand, a table that is in 1NF may or may not be in 2NF; if it is in 2NF, it may or may not be in 3NF, and so on.

A 1NF table is said to be in 2NF if and only if none of its nonprime attributes is functionally dependent on a part (proper subset) of a candidate key. (A nonprime attribute does not belong to any candidate key.)

Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF.

Check If a Relation R (A, B, C, D, E) with FD Set as { BC ? D, AC ? BE, B ? E } is in 2NF?

  • As we can see, the closure of AC is (AC)+ = {A, C, B, E, D} by applying the membership algorithm. But none of its subsets can determine all attribute of relation by themselves, so AC is the candidate key for this relation. Moreover, neither A nor C can be derived from any other attribute of the relation, so there will be only 1 candidate key which is {AC}.
  • Here {A, C} are the prime attributes and {B, D, E} are the nonprime attributes.
  • The relation R is already in 1st normal form as a relational DBMS in 1NF does not allow multi-valued or composite attribute.

BC ? D is in 2nd normal form because BC is not a proper subset of candidate key AC,

AC ? BE is in 2nd normal form as AC itself is the candidate key, and

B ? E is in 2nd normal form B is not a proper subset of candidate key AC.

Thus the given relation R is in the 2nd Normal Form.

Third Normal Form

A table is said to be in 3NF if and only if for each of its functional dependencies.

X → A, at least one of the following conditions holds:

  • X contains A (that is, X → A is a trivial functional dependency), or
  • X is a super key, or
  • A is a prime attribute (i.e., A is present within a candidate key)

Another definition of 3NFstates that every non-key attribute of R is non-transitively dependent (i.e. directly dependent) on the primary key of R. This means no nonprime attribute (not part of candidate key) is functionally dependent on other nonprime attributes. If there are two dependencies such that A ? B and BC, then from these FDs, we may derive A ? C. This dependence A-C is transitive.

Example of 3NF:

Consider the following relation Order (Order#, Part, Supplier, UnitPrice, QtyOrdered) with the given set of FDs:

Order# ? Part, Supplier, QtyOrdered   and Supplier, Part ? UnitPrice)

Here Order# is key to the relation.

Using Amstrong’s axioms, we get

Order# ? Part, Order ? Supplier, and Order ? QtyOrdered.

Order# ? Part, Supplier and Supplier, Part ? Unit Price, both give Order# ? UnitPrice.

Thus, we see that all nonprime attributes are depending on the key (Order#). However, there exists a transitive dependency between Order# and UnitPrice. So this relation is not in 3NF. How do we make it in 3NF?

We cannot store the UnitPrice of any Part supplied by any Supplier unless someone places an order for that Part. So we will have to decompose the table to make it follow 3NF as follows.

Order (Order#, Part, Supplier, QtyOrdered) and Price Master (Part, Supplier, UnitPrice).

Now there are no transitive dependencies present. The relation is in 3NF.

Also Read:  SQL for Data Science 

Conclusion

There’s more to normalization, like BCNF, 4NF, 5NF and 6NF. In short, BCNF is nothing but an extension of 3NF, as the last rule of 3NF doesn’t apply here. All functional dependencies need to have the key attributes on the left and none on the right-hand side. (BCNF is also called 3.5NF). However, normal forms from 4NF and beyond are scarcely implemented in regular practice.

If you’re interested to learn more about full-stack development, check out upGrad & IIIT-B’s PG Diploma in Full-stack Software Development, which is designed for working professionals and offers 500+ hours of rigorous training, 9+ projects, and assignments, IIIT-B Alumni status, practical hands-on capstone projects & job assistance with top firms.

Become a Full Stack Developer

UPGRAD AND IIIT-BANGALORE'S PG DIPLOMA IN SOFTWARE DEVELOPMENT
APPLY NOW

Leave a comment

Your email address will not be published.

×