Normalisation is a set of rules that should be followed when setting up the structure of your database. The principle of the rules is to allow for easy and efficient searching, updating, editing and deleting of data. By going through the steps of normalisation your database should be more efficient and incorrect or redundant data should be eliminated. The process we discuss here takes data from it un-normalised form to it 3rd normal form, whilst there are many more levels of normalisation they are not discussed here.
If you were to start a project and simply write a list of all the data fields you want to add to your database then this initial list of fields would be in un-normalised form.
Imagine you want to create a database to keep a record for an e-sports competition. In your initial list of fields you might want to included might look something like the list below.
playerName : To store the players name gameName : To store the game name gameDeveloper : To store the name of the company that developed the game score : To store the player score Although this is a small and simple dataset there are still problems with using only these fields. What if two players have the same name? What if a player plays more than one game, then their name will be repeated in the table many times? What if the player decided to change their name, they would have to search the table for every instance of their name and change it?
1NF (First Normal Form):
Each field must contain only atomic values, meaning indivisible values
There must be a unique identifier for each record
2NF (Second Normal Form):
Must be in 1NF
No non-key attributes can depend on a portion of the primary key.
3NF (Third Normal Form):
Must be in 2NF
There must not be any transitive dependencies between non-key fields.
Every non-key field must be directly dependent on the primary key.
FURTHER EXPLINATION 'No non-key attributes can depend on a portion of the primary key.' In 2NF, a non-key field is considered to depend on a portion of the primary key if it is only partially determined by the primary key. In other words, it is not fully dependent on the entire primary key to have a unique value. This partial dependency can lead to redundancy and anomalies in the data, so it must be eliminated in order to comply with 2NF. For example, consider a table with a primary key consisting of two fields (A and B), and a non-key field (C) that depends on field A. This table would not be in 2NF because C depends on only a portion of the primary key (field A), rather than being fully dependent on the entire primary key (fields A and B). To bring the table into 2NF, the partial dependency must be removed by creating a separate table for field C, with a foreign key referencing the primary key of the original table. 'There must not be any transitive dependencies between non-key fields' Transitive dependencies occur in a database when a non-key field depends on another non-key field, rather than directly on the primary key. This creates redundancy and increases the likelihood of data inconsistencies. For example, if field B depends on field A, and field C depends on field B, then field C has a transitive dependency on field A, which violates 3NF. To resolve this issue, the dependencies should be made direct and the redundant data eliminated.
'Every non-key field must be directly dependent on the primary key.' In 3NF, every non-key field in a table must be directly dependent on the primary key. This means that the value of each non-key field must be completely determined by the values of the primary key.
A direct dependency between a non-key field and the primary key ensures that there is no redundancy or anomalies in the data, and also eliminates the risk of data inconsistencies. For example, consider a table with a primary key of field A, and two non-key fields B and C. If B depends on A and C depends on B, then C does not have a direct dependency on the primary key (field A), which violates 3NF. To bring the table into 3NF, the dependencies must be made direct by creating separate tables for fields B and C, with foreign keys referencing the primary key of the original table.
By following the principles of 3NF, a database is better organized, more consistent, and easier to maintain and update over time.
NORMALISATION OF DATA
In this section we will continue to look at the example of creating a database for and e-sports competition. Previously we noted that the field shown in this table could be used for the project, and are still in un-normalised form.
APPLYING 1st NORMALISATION RULES
Rule 1: There is no repeating groups ✓ : This criteria is already met so no changes are needed because no cell has two pieces of information in. Let's say we needed to credit two game developers then we would need to split the 'gameDeveloper' field into two, for example 'gameDeveloper1' and 'gameDeveloper2'. Rule 2: All data values are atomic – broke down into is smallest part ✗ : The playerName should be broken up into its smallest parts, 'firstName' and 'lastName' fields will need to be added and 'playerName' removed. Rule 3: Each field has a unique name ✓ : This criteria is already met, each field has a unique name. Rule 4: It has a primary key✗ : The table does not have a primary key, or any field that could be used as a primary key. A primary key will need to be added to meet this rule.
After applying 1st Normal Form : Having changed the table to meet the rules of 1st Normal form, the new table can be seen below.
Table in 1st Normal Form
APPLYING 2nd NORMALISATION RULES
Rule 1: It is first in First Normal form (1NF) ✓ : The table already meet this criteria Rule 2: All non key attributes are dependent on All parts of the primary key ✗ : The table does not meet this criteria because attributes are not dependent on the 'playID' primary key, for example 'Tetris' in 'gameName' could be used in any playID or even none. This data will need to be split into multiple tables. Rule 3: Each field has a unique name ✗ : The new tables created for rule 2 will need to have unique names. Rule 4: It has a primary key ✗: The new tables created for rule 2 will also need to meet this criteria.
After applying 2nd Normal Form : Having changed the table to meet the rules of 1st Normal form, the new tables can be seen below.
Table in 2nd Normal Form
Table in 2nd Normal Form
As can be seen the the table needed to be split into two tables to be put into 2nd Normal Form and we can now see that there is no duplicate data but some data is missing, such as the second entry of Jill.
APPLYING 3rd NORMALISATION RULES
Rule 1: It is first in Second Normal Form (2NF) ✓ : The tables already meet this criteria Rule 2: Every non-key attribute of a table is either non-transitively dependent on every key of the table or not at all.✗ : The game 'score' is not dependent on the game, it is dependent on the play of the game therefore the game score will need to move to a separate table. Rule 3: Each field has a unique name ✓ : The tables already meet this criteria Rule 4: It has a primary key ✓ : The tables already meet this criteria
Table in 3rd Normal Form
Table in 3rd Normal Form
Table in 3rd Normal Form
Using the three tables above the database is now in 3rd Normal Form and ready to add the relational links between each table. In reality there would be a few more tables created, for example the 'score' could be in it own table and tables such as 'Game_score_historic_data' or 'player_historic_data' could be added, however using this simple three table model reduces complications to aid understanding of how the relationship between tables work. The next section on Entity Relationship Diagrams (ERD), shows the relationships between the tables with the 'crows foot' notation to show the relationship type.
3NF | "Every non-prime attribute of a table is either non-transitively dependent on every key of the table or not at all."
Here's a breakdown of this rule:
Non-Prime Attributes: These are attributes that are not part of any candidate key.
Non-transitively Dependent: This means that a non-prime attribute must depend directly on the primary key and not through some other non-prime attribute. In simpler terms, there should be no transitive dependency.
Every Key of the Table: This includes both the primary key and any candidate keys.
In essence, 3NF aims to ensure that all the attributes in a table are only dependent on the primary key and not on any other non-key attributes. This is to avoid transitive dependencies which can lead to anomalies in database operations like insertions, deletions, and updates.
For instance, consider a table with attributes A, B, and C, where A is the primary key. If B is dependent on A and C is dependent on B, then this table is not in 3NF because C’s dependency on A is transitive through B. In 3NF, C should be directly dependent on A or not dependent at all.
3NF is critical for database design as it helps in reducing data redundancy and improves data integrity. By ensuring that every non-key attribute is directly dependent on the primary key, it simplifies the relationships between different attributes and makes the database more efficient for various operations.
STILL CONFUSED ? CHECKOUT THESE VIDEOS
Question: How can the above data be normalized from UNF to Third Normal Form (3NF)?
ATOMIC DATA : Data is broken down into it smallest parts. For example fullName should be - firstName, middleName, lastName. KEY ATTRIBUTES : In database terms, 'key attributes' are columns or fields in a table that uniquely identify a record (row) in the table. They serve as the main identifier for each record and are used to establish relationships with other tables in a database. Key attributes can be a primary key, which is a single field that uniquely identifies each record, or a composite key, which is a combination of two or more fields that together uniquely identify each record. DATA DEPENDENCY : FOREIGN KEY : A key in a table that is the primary key in another table, used as a link between the two tables.