# What is normalization? Why normalization is required? Explain 1NF, 2NF, and 3NF with example.

Normalization is a database design technique that reduces data redundancy and eliminates undesirable characteristics like Insertion, Update, and Deletion Anomalies. Normalization rules divide larger tables into smaller tables and link them using relationships. The purpose of Normalisation in SQL is to eliminate redundant (repetitive) data and ensure data is stored logically.

Normalization is required because of the following reasons:

• By using normalization redundancy of database or data duplication can be resolved.
• We can minimize null values by using normalization.
• Results in a more compact database (due to less data redundancy/zero).
• Minimize/avoid data modification problems.
• It simplifies the query.is
• The database structure is clearer and easier to understand.
• The database can be expanded without affecting existing data.
• Finding, sorting, and indexing can be faster because the table is small and more rows can be accommodated on the data page.

Here are the most commonly used normal forms:

• First normal form(1NF)
• Second normal form(2NF)
• Third normal form(3NF)
• Boyce & Codd normal form (BCNF)
• Fourth normal form(4NF)

1. First Normal Form (1NF):

For a table to be in the First Normal Form, it should follow the following 4 rules:

1. It should only have single(atomic) valued attributes/columns.
2. Values stored in a column should be of the same domain
3. All the columns in a table should have unique names.
4. And the order in which data is stored does not matter.

Example: Let’s say a company wants to store the names and contact details of its employees. It creates a table in the database that looks like this:

 Emp_Id Emp_Name Emp_Address Emp_Mobile 101 Herschel New Delhi 8912312390 102 Jon Kanpur 8812121212, 9900012222 103 Ron Chennai 7778881212 104 Lester Bangalore 9990000123, 8123450987

This table is not in 1NFas the rule says “each attribute of a table must have atomic (single) values”, the Emp_Mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we need to create separate rows for each mobile number in such a way so that none of the attributes contains multiple values.

 Emp_Id Emp_Name Emp_Address Emp_Mobile 101 Herschel New Delhi 8912312390 102 Jon Kanpur 8812121212 102 Jon Kanpur 9900012222 103 Ron Chennai 7778881212 104 Lester Bangalore 9990000123 104 Lester Bangalore 8123450987

2. Second Normal form (2NF):

For a table to be in the Second Normal Form,

1. It should be in the First Normal form.
2. And, it should not have Partial Dependency.

Example: Let’s say a school wants to store the data of teachers and the subjects they teach. They create a table Teacher that looks like this: Since a teacher can teach more than one subject, the table can have multiple rows for the same teacher.

 Teacher_Id Subject Teacher_Age 111 Maths 38 111 Physics 38 222 Biology 38 333 Physics 40 333 Chemistry 40

Candidate Keys: {Teacher_IdSubject}
Non-prime attribute: Teacher_Age

This table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because the non-prime attribute Teacher_Age is dependent on Teacher_Id alone which is a proper subset of the candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key of the table”.

To make the table complies with 2NF we can disintegrate it in two tables like this:

Teacher_Details table:

 Teacher_Id Teacher_Age 111 38 222 38 333 40

Teacher_Subject table:

 Teacher_Id Subject 111 Maths 111 Physics 222 Biology 333 Physics 333 Chemistry

Now the tables are in the Second normal form (2NF).

3. Third Normal Form (3NF):

A table is said to be in the Third Normal Form when,

1. It is in the Second Normal form.
2. And, it doesn’t have Transitive Dependency.

Example: Let’s say a company wants to store the complete address of each employee, they create a table named Employee_Details that looks like this:

 Emp_Id Emp_Name Emp_Zip Emp_State Emp_City Emp_District 1001 John 282005 UP Agra Dayal Bagh 1002 Ajeet 222008 TN Chennai M-City 1006 Lora 282007 TN Chennai Urrapakkam 1101 Lilly 292008 UK Pauri Bhagwan 1201 Steve 222999 MP Gwalior Ratan

Super keys: {Emp_Id}, {Emp_IdEmp_Name}, {Emp_IdEmp_NameEmp_Zip}…so on
Candidate Keys: {Emp_Id}

Non-prime attributes: all attributes except Emp_Id are non-prime as they are not part of any candidate keys.

Here, Emp_StateEmp_City & Emp_District dependent on Emp_Zip. Further Emp_zip is dependent on Emp_Id what makes non-prime attributes (Emp_State, Emp_City & Emp_District) transitively dependent on super key (Emp_Id). This violates the rule of 3NF.

To make this table complies with 3NF we have to disintegrate the table into two tables to remove the transitive dependency:

Employee Table:

 Emp_Id Emp_Name Emp_Zip 1001 John 282005 1002 Ajeet 222008 1006 Lora 282007 1101 Lilly 292008 1201 Steve 222999

Employee_Zip table:

 Emp_Zip Emp_State Emp_City Emp_District 282005 UP Agra Dayal Bagh 222008 TN Chennai M-City 282007 TN Chennai Urrapakkam 292008 UK Pauri Bhagwan 222999 MP Gwalior Ratan