Normalisation in RDBMS
Why Normalisation
Normalisation of a data is done to reduce the redundant/duplicate data and making the database free from anomalies like insertion,deletion and updation anomalies.
It make the queries simple and faster.
It saves space.
It organises the data base.
It eliminated the duplicate / repeated data.
NOTE: The data that we have must make some sense.
Now what is anomaly?
Ex; suppose there is same data at multiple places, we updated the data at one place but forget to update the data at other cells. Now it will cause trouble for us as we cant conclude which data is correct. As the data is something at one place and something else at another place.
There are 3 anomalies: Insertion, deletion, and updation.
Let make it by an example.
Suppose for a new postion in a company, Mr Chintu is selected but his department is not alloted to him in that case if we want to update his info into the database, we have to set the department value as null. This is the case with only one employee.
Similarly if we ahve to set this dept. data for thousands of employees who are in same condition, then this dept data will be repeated for all those thousands of employeers. This is an ex of inssertion anomaly.
2. What if mr chintu leaves the company and no longer he is the head of marketing. In that case, all the employee record has to be updated and if we forget to update any record, this will lead to the data inconsistency. This is updation anomaly.
3. Deletion anomaly- In our employee table, there are 2 columns: Employee and department. At the end of the financial year, if employee records are deleted, the department info is also lost.
Types of normal form.
1NF - In first normal form, we deal with the problem of atomicity. Atomicity means that the values of the cell are in its most reduced form. Just like the atoms, they cannot be further divided.
In layman, a single cell must contain a single value to follow the atomicity rule.
NOTE: If a table contains composite or mutlivalued attributes, it voilates the first normal form.
Suppose in a table, we have employee id, name, phoneno, salary as columns.
Since one employee can have more than one phoneno, so phoneno is amultivalued attributed. So we need to make separed rows to solve this problem. This way we can say that the table has achieved its atomicity.
2NF- the table is said to be in 2nd normal form if it follows:
*1nf
*The table should not contain partial dependency.
Now what is partial dependency?
The proper subset of a candidate key determines a non prime attribute.
Now whats a non prime attribute?
Attributes that form a candidate key in a table are prime attributes and rest are nonprime.
For the table, that we have columns like employee id and department id and the nonprime attribute are the office location.
EX: in the table, employee id and department id makes the primary key. the office location column depends on the department id. So we have to split the table accordingly.
the first table has employment id and department id and the 2nd one has dep id and office location.
The column office location is fully dependent on the primary key of that table which is nothing but the dep id.
3.)3NF- table is said to be in 3rd normal form if it
* follows 2nf
* no non prime attribute is transitively dependent on any non prime attrubute which depends on any non prime attribute.
lets make it simple, if c depends on b and b depends on a then ,transitively c depends on a . So there should be no transitive dependency of non prime attributes on other non prime attribute. The non prime attribute should be dependent on the prime attribute.
Ex: in a table, student id determines subject id and subj id determines subjects. so student id determines subj via subj id. This shows that there is transitive functional dependency in the table and it voilates the 3nf.
What to do?
Divide the table, one table has student id, student name, subj id and address and other table has subj id and subj. All the non prime attributes are fully functionally dependent on the primary key.
In the first table columns like student name, subj id and address are completely dependent on student id. Similarly in 2nd table subj only depends on subj id.
4. BCNF- boyce codd normal form. also called 3.5 nf. It is the higher version of 3nf. Before bcnf , you must know what is a super key?
Super key is a set of one or more attributes(columns) which can uniquely identify a row in a table. Canditate key is the subset of a superkey.
a table is said to be in bcnf if it follows
* 3nf
*every functional dependency A-> B , then A has to be the super key of that particular table.
Ex: in a table , one student can enroll for multiple subjects, there can be multiple teachers for one subject and for each subject a profressor is assigned to the student. all normal forms are followed except.
This is becuase the student id and subject forms the primary key and the subject columsnis prime attribute. But there is one more dependency that the professors depends on subjects. while the subject is the prime attribute and the professor is the non prime atttri, which is not the rule for bcnf,.
So we will divide it in 2 parts. one table will contain the sstudent and the professer id. Here we created a new column named professor id. And the other table includes professor id , subj and professor.
NOTE: by creating a new column "professsor id ", we have removed the non prime attributes functional dependency. In the 2nd table professor id is the super key for that table and remaining columns of it are functionally dependent on it.
1 Comments:
Implementation to be added
Post a Comment
Subscribe to Post Comments [Atom]
<< Home