SQL: Normalization

Normalization:

If a database design is not perfect, it may contain anomalies. Managing a database with anomalies is next to impossible. Some of the most prominent anomalies are:

  • Update anomalies − If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly while a few others are left with old values. Such instances leave the database in an inconsistent state.
  • Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also saved somewhere else.
  • Insert anomalies – We tried to insert data in a record that does not exist at all.

Normalization is a method to remove all these anomalies and bring the database to a consistent state.

Database normalization is the process of organizing data into tables in such a way that the results of using the database are always unambiguous and as intended. The main aim of Normalization is to add, delete or modify field that can be made in a single table. It is set of rules that are to be applied while designing the database tables which are to be connected with each other by relationships. It may have the effect of duplicating data within the database and often results in the creation of additional tables.

Types of Normalizations:

Here are the most commonly used normal forms:

1. First Normal Form (1NF):

As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It should hold only atomic values. 1NF removes all the duplicate columns from the table, creates tables for the related data and identifies of unique columns.

2. Second Normal Form (2NF):

A table is said to be in 2NF if  it meets all requirements of the 1NF, places the subsets of data in separate tables and creates relationships between the tables using primary keys. So, basically, both the following conditions should hold:

a. Table is in 1NF (First normal form)

b. No non-prime attribute is dependent on the proper subset of any candidate key of table. (An attribute that is not part of any candidate key is known as non-prime attribute.)

3. Third Normal Form (3NF):

A table design is said to be in 3NF if both the following conditions hold:

Table must be in 2NF

Transitive functional dependency of non-prime attribute on any super key should be removed. An attribute that is not part of any candidate key is known as non-prime attribute. A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies. For e.g.

X -> Z is a transitive dependency if the following three functional dependencies hold true:

a. X->Y

b. Y does not ->X

c. Y->Z

Note: A transitive dependency can only occur in a relation of three of more attributes. This dependency helps us normalizing the database in 3NF (3rd Normal Form).

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional dependency X-> Y at least one of the following conditions hold:

a. X is a super key of table

b. Y is a prime attribute of table (An attribute that is a part of one of the candidate keys is known as prime attribute.)

4. Boyce & Codd normal form (BCNF 0r 3.5NF):

It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.

Benefits of normalizing the database are:

  1. No need to restructure existing tables for new data.
  2. Reducing repetitive entries.
  3. Reducing required storage space
  4. Increased speed and flexibility of queries.

 

Denormalization:

DeNormalization is a technique used to access the data from higher to lower normal forms of database. It is an approach to speeding up read performance (data retrieval) in which the administrator selectively adds back specific instances of redundant data after the data structure has been normalized. It is also process of introducing redundancy into a table by incorporating data from the related tables. A denormalized database should not be confused with a database that has never been normalized.

 

 

No comments:

Post a Comment