Normalization:




Normalization is a process in which simple and more reliable database structure is implemented. It is used to create a suitable set of relations for storing the data. This process works through different stages known as normal forms. These stages are 1NF, 2NF, 3NF, etc.

stages

Each normal form has certain requirements or condition. These conditions must be fulfilled to bring the database in that particular normal form. If a relation satisfies the conditions of a normal form, it is said to be in that normal form.




The task of database design starts with un-normalized set of relations. Normalization process identifies and corrects problems and complexities of database design. It produces a new set of consistent relations. The new design is as free of processing problems as possible.

Function Dependency

A functional dependency is a relationship between two attributes. It means that if the value of one attribute is known, it is possible to obtain the value of another attribute.

Suppose there is a relation STUDENT with following fields:

STUDENT (RegNo, StudentName, Class, Email)

If value of RegNo is known, it is possible to obtain the value of StudentName. It means that StudentName is functionally dependent on RegNo,. An attribute B is functionally dependent on attribute A if the value of A determines the value of B.

Functional dependency is written as follows:

RegNo     StudentName

The above expression means RegNo determines StudentName or StudeName is functionally dependent on RegNo. The attribute on left side is called determinant.

An attribute can also be functionally dependent on two or more attributes instead of a single attribute. The attribute CRSDATE is functionally dependent on both STUDID and CRSNO in the following example:

COURSE (STUDID, CRSNO, CRSDATE)

The functional dependency in the relation is represented as follows:

STUDID, DRSNO         CRSDATE

The course completion date CRSDATE can be determined only if both STUDID and CRSNO are known. No single attribute can be used to determine CRSDATE.

First Normal Form

A relation is in first normal form (1NF) if every intersection of row and column contains atomic values only. It means that the relation does not contain any repeating group. A repeating group is a set of one or more data items that may occur a variable number of times in a tuple. Each cell in a relation should contain only one value.

A relation can be converted into 1NF using two methods:

Method 1

In this method, the repeating groups are removed by entering proper data in blank fields that contain repeating data.

Method 2

In this method, the repeating groups are removed by creating a separate relation for the repeating group. The repeating data is stored in the new relation..

Example of Method 2

Suppose we have the following un-normalized relation DEPARTMENT:

DEPARTMENT (Dept_No, Dept_Name, Emp_No, Emp_Name)

Suppose the data in the relation is as follows:

example-of-method2

Applying method 1

The above relation contains repeating groups. The attributes Emp_No and Emp_Name are being repeated for single occurrence of Dept_No. The repeating groups are removed by entering proper data in the blank fields as follows:

applying-method-1

The above relation is now in 1NF as each field contains single value. It will have a composite key that consists of Dept_No and Emp_No attributes.

Applying Method 2

Consider the following table:

applying-method-2

The relation in this table contains repeating groups. The attributes Emp_No and Emp_Name are being repeated for single occurrence of Dept_No:

The following steps are taken to remove the repeating groups:

  1. The repeating group must be removed from the relation. It can be stored in a separate relation to represent an entity. It means that the above relation will be divided into two relations.
  2. Create a relationship between the new relations.

The attribute Emp_No and Emp_Name represent employee entity. They can be stored in a separate relation. This process will convert the above relation in two relations DEPT and EMP as follows:

dept-and-emp

The relationship between the two relations is 1:M. It means that one department has many employees but one employee has only one department.

Database Anomalies

Database anomalies are the problems in relations that occur due to redundancy in the relations. These anomalies affect the process of inserting, deleting and modifying data in the relations. Some important data may be lost if a relation is updated that contains database anomalies. It is important to remove these anomalies in order to perform different processing on the relations without any problem.

Types of Anomalies

Different types of database anomalies are as follows:

Insertion Anomaly

The insertion anomaly occurs when new record is inserted in relation. In this anomaly, user cannot insert a fact about an entity until he has an additional fact about another entity.

Deletion Anomaly

The deletion anomaly occurs when a record is deleted from the relation. In this anomaly, the deletion of facts about an entity automatically deletes the fact of another entity.

Modification anomaly

The modification anomaly occurs when the record is updated in the relation. In this anomaly, the modification in the value of specific attribute requires modification in all records in which that value occurs.

Partial Dependency

A type of dependency in which one or more non-key attributes are functionally dependent on a part of primary key is called partial dependency.

Example:

STUDENT (STUD_ID, NAME, DEPT, MONFEE, CRSNO, CDTE)

The relation contains a composite primary key that consist of STUD_ID and CRSNO.

crsno

The functional dependencies in this relation are as follows:

STUD_ID     →     NAME, DEPT, MONFEE

STUD_ID, CRSNO          CRSDTE

The non-key attributes NAME, DEPT and MONFEE are functionally dependent on a part of primary key STUD_ID. They are not functionally dependent on complete key. It is called partial dependency. The partial dependency in the above relation creates redundancy. It results in certain database anomalies when the relation is updated.

  • Insertion Anomaly: Suppose a new course WEBDEV is to be inserted in the relation. The new CRSNO cannot be inserted in relation without inserting STUD_ID. The primary key consists of STUD_ID and CRSNO.
  • Deletion Anomaly: Suppose the record of STUD_ID 140 is to be deleted. The information of TAXACCT will also be deleted along with that student.
  • Modification Anomaly: Suppose the name of STUD_ID 100 is to be updated to “Sam Billings”. It requires updating all records in which STUD_ID 100 appears.




Second Normal Form

If in a relation every non key attribute is dependent on the primary key and if this relation is in 1NF then this relation will be a 2NF relation. All non-key attributes must depend on all parts of the primary key.

Conditions for second Normal Form

A relation will be in 2NF if any of the following conditions are met:

  1. The primary key consists of only one attribute
  2. No non-key attributes exist in the relation
  3. Every non-key attribute is functionally dependent on the full set of primary key attributes.

Example:

The following example STUDENT is in first normal form:

student-relation

The above relation contains different redundancies. The relation can be expressed in a shorthand notation as follows:

STUDENT (STUD_ID, NAME, DEPT, MONFEE, CRSNO, CDTE)

The above relation contains a composite primary key of STUD_ID and CRSNO. The functional dependencies in this relation are as follows:

STUD_ID          NAME, DEPT, MONFEE

STUD_ID, CRSNO     →     CRSDTE

The non-key attributes NAME, DEPT and MONFEE are functionally dependent on a part of primary STUD_ID. They are not functionally dependent on complete key. A type of dependency in which one or more non-key attributes are functionally dependent on a part of primary key is called partially dependency. The partial dependency in the above relation creates redundancy. It results in certain database anomalies when the relation is updated.

  • Insertion Anomaly: Suppose a new course WEBDEV is to be inserted in the relation. The new CRSNO cannot be inserted in relation without inserting STUD_ID. The primary key consists of STUD_ID and CRSNO.
  • Deletion Anomaly: Suppose the record of STUD_ID 140 is to be deleted. The information of TAXACCT will also be deleted along with that student.
  • Modification Anomaly: Suppose the name of STUD_ID 100 is to be updated to “Sam Billings”. It requires updating all records in which STUD_ID 100 appears.

All the partially dependent attributes are removed and placed in another relation when a relation is converted from 1NF to 2NF. The decomposition must satisfy one of the above mentioned conditions.

STUDENT relation can be split in two relations STUDENT1 and COURSE as follows:

STUDENT1 (STUD_ID, NAME, DEPT, MONFEE)

student1

COURSE (STUD_ID, CRSNO, CDTE)

course

The primary key in STUDENT relation consists of only one attribute. It satisfies the first condition. The attribute CDTE in COURSE relation fully functionally depends on whole composite key. It satisfies the third condition. It means that both relations are in second normal forms.

Transitive Dependency

The transitive dependency is a type of functional dependency between two or more non-key attributes. Transitive dependency exists if a non-key attribute depends on any other non key attribute. It means that a relation is not in 3NF if the value of a non-key attribute can be obtained by knowing the value of another non-key attribute.

Anomalies due to Transitive dependency

The following anomalies arise due to transitive dependency:

  • Insertion Anomaly
  • Deletion Anomaly
  • Modification Anomaly

Suppose we have the following relation:

we-have-the-following-relation

The above relation contains the following functional dependencies:

CUSTNO         NAME, SALESMAN

SALESMAN           REGION

The above relation is in 2NF because the primary key consists of single attribute. A transitive dependency exists in the relation. The REGION is functionally dependent on SALESMAN and SALESMAN is functionally dependent on CUSRNO. It means that REGION is transitively dependent on CUSTNO.

Third Normal Form

A 2NF relation having no transitive dependencies is called a 3NF relation. In normalization, the transitive dependency is considered very important. In a 2NF relation the transitive dependency can be removed through normalization. In a relation, if we can assign the value of non-key attribute to another non key attribute then the relation will not be in 3NF.

EXAMPLE:

Suppose the following relation SALES exists:

SALES (CUSTNO, NAME, SALESMAN, REGION)

In the above relation, we used CUSTNO as the primary key. In the relation every salesman has assigned a different REGION. This relation has the following functional dependencies.

CUSTNO         NAME, SALESMAN

SALESMAN     →    REGION

In the above relation, the primary key has one attribute or has single attribute therefore, the above relation will be a 2NF relation. A transitive dependency exists in the relation. The REGION is functionally dependent on SALESMAN. The salesman is also functionally dependent on the CUSTNO. From this we conclude that the REGION is transitively dependent on CUSTNO.

custno

The relation in the above example will have the following anomalies:

  • Insertion Anomaly: Suppose “Abraham” a new salesman has recently joined the company and let him is assigned to the North region. Then the record of the recently joined salesman cannot be entered unless the new salesman is assigned a customer. This is only because the above relation has the primary key that is CUSTNO.
  • Deletion Anomaly: In the above relation, if we delete the record in CUST 40 then the information of the salesman “Billings” will also be deleted. The salesman “Billings” was assigned to East.
  • Modification Anomaly: If we assign “Jones” to the South region then all the records that have Jones will need to be updated.




Removing Transitive Dependency

To convert 2NF to 3NF, the transitive dependency should be removed. The anomalies in the above relation are occurred because of the transitive dependencies. Therefore, transitive dependency should be removed. The transitive dependency is removed if the relation in the above example is transformed as the following relation:

removing-transitive-dependency

These relations can also be represented as below:

SALES1 (CUSTNO, NAME, SALESMAN)

SMAN (SALESMAN, REGION)

The above relations are in 3NF. The transitive dependency in these relations is removed. There are also no anomalies left in the above relations. The determinant attribute will become the primary key in transitive dependency in SMAN relation.