Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconFull Stack Developmentbreadcumb forward arrow iconNormalization in SQL: 1NF, 2NF, 3NF & BCNF

Normalization in SQL: 1NF, 2NF, 3NF & BCNF

Last updated:
23rd Jun, 2023
Views
Read Time
10 Mins
share image icon
In this article
Chevron in toc
View All
Normalization in SQL: 1NF, 2NF, 3NF & BCNF

Normalization is a systematic process of ensuring that a relational database model is efficient, suitable for general-purpose querying and free of undesirable characteristics such as insertion, update, and deletion anomalies, leading to losing the integrity of the data. This normalization process also helps to eliminate data redundancy and reduces the chances of inconsistency after any insert, update, or delete operations.

Understanding Normalization in SQL 

Normalization is a process in database design that organizes data into logical and efficient structures. It ensures that the data is stored to reduce redundancy and minimize data anomalies, such as update, insert, and deletion anomalies. SQL/Structured Query Language, is a popular language used to manage and manipulate databases. Normalization in SQL server is a way of organizing data stored in tables to optimize the efficiency and accuracy of queries.

Uses of Normalization in SQL 

Normalization involves breaking data into its smallest logical units and creating relationships between them. This allows for reduced duplication and faster query performance when retrieving or manipulating data. It even helps ensure the integrity of the database by ensuring that related fields are not stored together in one table. For example, if an address is included in multiple columns within a single table, it can lead to problems if that address needs to be updated. All entries associated with the old address must be correctly identified and updated. With normalization, however, each part of the address (street name, city, etc.) is stored in its table, making it easier to update and manage.

Normalization in SQL server can also help reduce data storage costs, as redundant data is eliminated. With fewer tables to maintain, the database remains better organized and more efficient. An example of normalization in SQL with an example would be a table that stores customer information like name, address, phone number and email address. By applying the principles of normalization, this table could be broken down into three separate tables – one for names, one for addresses and one for contact details – eliminating any redundancy or duplication. This makes querying the database faster and reduces the risk of updating errors due to incorrect relationships between fields. Understanding how normalization works in SQL is essential for creating efficient databases that perform optimally when retrieving data.

Ads of upGrad blog

For a better understanding, consider the following schema: Student (Name, Address, Subject, Grade)

Check out our free courses to get an edge over the competition.

There are a few problems or inefficiencies in this schema.

1) Redundancy: The student’s Address is repeated for each subject he is registered for.

2) Updating anomaly: We may have updated the Address in one tuple (row) while leaving it unchanged in the other rows. Thus we would not have a consistently unique address for each student.

3) Insertion Anomaly: We will not record a student’s Address without registering for at least one Subject. Similarly, when a student wants to enrol for a new Subject, it’s possible that a different Address to be inserted.

4) Deletion Anomaly: If a student decides to discontinue all the enrolled subjects, then the student’s address will also be lost in the process of deletion.

Thus, it is important to represent the user data by relations that do not create anomalies following tuple add, delete, or update operations. This can only be achieved by a careful analysis of the integrity constraints, especially the database’s data dependencies.

The relations should be designed so that only those attributes are grouped that exist naturally together. This can mostly be done by a basic understanding of the meaning of all data attributes. However, we still need some formal measure to ensure our design goal.

Explore our Popular Software Engineering Courses

Check out upGrad’s Java Bootcamp

Normalization is that formal measure. It answers the question of why a particular grouping of attributes will be better than any other.

Seven normal forms exist as of today:

  • First Normal Form (1NF)
  • Second Normal Form (2NF)
  • Third Normal Form (3NF)
  • Boyce-Codd Normal Form (BCNF)
  • Fourth Normal Form (4NF)
  • Fifth Normal Form (5NF)
  • Sixth or Domain-key Normal form (6NF)

Read: Types of Views in SQL

First Normal Form (1NF or Minimal Form)

  • There’s no top-to-bottom ordering to the rows and left-to-right ordering to the columns.
  • There are no duplicate rows.
  • Every row-and-column intersection contains exactly one value from the applicable domain or null value. This condition indicates that all column values should be atomic, scalar, or holding only a single value. No repetition of information or values in multiple columns is allowed here.
  • All columns are regular (i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps).

Check out upGrad’s Full Stack Development Bootcamp (JS/MERN)

Let’s take an example of a schema that is not normalized. Suppose a designer wishes to record the names and telephone numbers of customers. They define a customer table as shown:

Customer IDFirst NameSurnameTelephone Numbers
123BimalSaha555-861-2025
456KapilKhanna555-403-1659, 555-776-4100
789KabitaRoy555-808-9633

Here, it is not in 1 NF. The Telephone Numbers column is not atomic or doesn’t have a scalar value, i.e. it has had more than one value, which can’t be allowed in 1 NF.

In-Demand Software Development Skills

To Make It 1 NF

  • We’ll first break (decompose) our single table into two.
  • Each table should have information about only one entity.
Customer IDFirst NameSurname
123BimalSaha
456KapilKhanna
789KabitaRoy

 

Customer IDTelephone Numbers
123555-861-2025
456555-403-1659
456555-776-4100
789555-808-9633

Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone Number link appears on its own record.

Checkout: Most Common SQL Interview Questions & Answers

Second Normal Form

Each normal form has more constraining criteria than its predecessor. So any table that is in second normal form (2NF) or higher is, by definition, also in 1NF. On the other hand, a table that is in 1NF may or may not be in 2NF; if it is in 2NF, it may or may not be in 3NF, and so on.

A 1NF table is said to be in 2NF if and only if none of its nonprime attributes is functionally dependent on a part (proper subset) of a candidate key. (A nonprime attribute does not belong to any candidate key.)

Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF.

Benefits of Normalization in SQL Server

  • Reduces redundancy and data anomalies
  • Improves query performance by eliminating duplicate data
  • Ensures integrity of the database by separating related fields
  • Reduces storage costs due to fewer tables required to store data
  • Makes updating easier as only related fields need to be updated when changes are made.

Overall, normalization in SQL server is a vital part of creating an efficient and accurate database. By understanding how normalization works and applying it correctly, developers can ensure that their databases perform optimally and remain organized with minimal effort. Normalization makes querying the database faster, reduces data storage costs, and helps maintain the system’s integrity – all important considerations for any business organization or application developer.

Explore Our Software Development Free Courses

upGrad’s Exclusive Software Development Webinar for you –

SAAS Business – What is So Different?

 

Check If a Relation R (A, B, C, D, E) with FD Set as { BC ? D, AC ? BE, B ? E } is in 2NF?

  • As we can see, the closure of AC is (AC)+ = {A, C, B, E, D} by applying the membership algorithm. But none of its subsets can determine all attribute of relation by themselves, so AC is the candidate key for this relation. Moreover, neither A nor C can be derived from any other attribute of the relation, so there will be only 1 candidate key which is {AC}.
  • Here {A, C} are the prime attributes and {B, D, E} are the nonprime attributes.
  • The relation R is already in 1st normal form as a relational DBMS in 1NF does not allow multi-valued or composite attribute.

BC ? D is in 2nd normal form because BC is not a proper subset of candidate key AC,

AC ? BE is in 2nd normal form as AC itself is the candidate key, and

B ? E is in 2nd normal form B is not a proper subset of candidate key AC.

Thus the given relation R is in the 2nd Normal Form.

Third Normal Form

A table is said to be in 3NF if and only if for each of its functional dependencies.

X → A, at least one of the following conditions holds:

  • X contains A (that is, X → A is a trivial functional dependency), or
  • X is a super key, or
  • A is a prime attribute (i.e., A is present within a candidate key)

Another definition of 3NFstates that every non-key attribute of R is non-transitively dependent (i.e. directly dependent) on the primary key of R. This means no nonprime attribute (not part of candidate key) is functionally dependent on other nonprime attributes. If there are two dependencies such that A ? B and BC, then from these FDs, we may derive A ? C. This dependence A-C is transitive.

Example of 3NF:

Consider the following relation Order (Order#, Part, Supplier, UnitPrice, QtyOrdered) with the given set of FDs:

Order# ? Part, Supplier, QtyOrdered   and Supplier, Part ? UnitPrice)

Here Order# is key to the relation.

Using Amstrong’s axioms, we get

Order# ? Part, Order ? Supplier, and Order ? QtyOrdered.

Order# ? Part, Supplier and Supplier, Part ? Unit Price, both give Order# ? UnitPrice.

Thus, we see that all nonprime attributes are depending on the key (Order#). However, there exists a transitive dependency between Order# and UnitPrice. So this relation is not in 3NF. How do we make it in 3NF?

We cannot store the UnitPrice of any Part supplied by any Supplier unless someone places an order for that Part. So we will have to decompose the table to make it follow 3NF as follows.

Order (Order#, Part, Supplier, QtyOrdered) and Price Master (Part, Supplier, UnitPrice).

Now there are no transitive dependencies present. The relation is in 3NF.

Also Read:  SQL for Data Science 

Ads of upGrad blog

Learn Software Courses online from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.

Conclusion

There’s more to normalization, like BCNF, 4NF, 5NF and 6NF. In short, BCNF is nothing but an extension of 3NF, as the last rule of 3NF doesn’t apply here. All functional dependencies need to have the key attributes on the left and none on the right-hand side. (BCNF is also called 3.5NF). However, normal forms from 4NF and beyond are scarcely implemented in regular practice.

If you’re interested to learn more about full-stack development, check out upGrad & IIIT-B’s Executive PG Program in Full-stack Software Development, which is designed for working professionals and offers 500+ hours of rigorous training, 9+ projects, and assignments, IIIT-B Alumni status, practical hands-on capstone projects & job assistance with top firms.

Profile

Rohan Vats

Blog Author
Software Engineering Manager @ upGrad. Passionate about building large scale web apps with delightful experiences. In pursuit of transforming engineers into leaders.

Frequently Asked Questions (FAQs)

1What is database normalization?

When you store data in a database, you are confronted with the issue of organizing that data. Data can often be reorganized to minimize the amount of space used and make retrieval and updating faster. This is known as normalization. Data is often normalized by storing redundant data, because it can avoid storage of multiple copies of the same data. So, it reduces storage space, lookup time and simplifies retrieval. Database normalization is a process of organizing and restructuring databases to improve efficiency and reduce redundancies. The basic concept is to eliminate "duplicate" information by ensuring all related data is stored in one place, and that that place is clearly identified.

2What are the different types of normal forms?

The normal forms were developed by Edgar F. Codd, the father of relational databases. Each normal form is a level of the overall logical correctness of the relational model and serves a purpose in the actual design of databases. The first normal form, 1NF, is all about table design, and involves removing duplicates and ensuring that every piece of data is represented only once in the table. The second normal form is about duplicable columns - breaking them down into multiple tables. The third normal form is about repeating groups - breaking them down into multiple tables. The fourth normal form is about 1NF, 2NF, & 3NF - ensuring that the tables are free from any logical or de-normalization.

3How to normalize a database?

Normalizing a database is the process of breaking it down into the smallest number of tables. In the end, the database will have no repeating fields and no rows with partial information. The purpose is to ensure that all data is linked to all other relevant data, and when a change occurs in one record, all other records that may be related to it are changed as well.