A3.2.5 Explain the difference between normal forms.
• First normal form (1NF), second normal form (2NF), third normal form (3NF)
• The terms atomicity, unique identification, functional dependencies, partial-key dependencies, non-key/transitive dependencies
• Normalization issues can encompass data duplication, missing data, and a range of dependency concerns, including data dependencies, composite key dependencies, transitive dependencies, and multi-valued dependencies.
Big Idea
When designing a database, one of the most important goals is to store data in a clean and reliable structure. Poor database design often leads to problems such as:
- The same information being stored in many places
- Missing or incomplete data
- Confusing relationships between pieces of information
- Errors when updating or deleting records
Normalization is the process used to organize a relational database so that these problems are minimized.
Normalization works by applying a series of rules called normal forms. Each normal form identifies a common design flaw and restructures the database to eliminate it.
The three most important normal forms are:
- First Normal Form (1NF) – ensures each field contains only one value.
- Second Normal Form (2NF) – ensures every column depends on the entire primary key.
- Third Normal Form (3NF) – ensures columns depend only on the key and not on other non-key columns.
By progressing through these stages, a database becomes more logically structured, easier to maintain, and less prone to errors.
For IB Computer Science, you are expected to explain the differences between these normal forms and understand the types of problems each one prevents.
Key Concepts
Before studying the normal forms themselves, several important concepts must be understood.
Atomicity
A field should contain a single, indivisible value.
For example, storing this value:
Math, Physics, Chemistry
inside a single cell is problematic because the database cannot easily treat each subject separately.
Instead, each value should appear in its own row.
Unique Identification
Every row in a table must represent exactly one unique entity.
This is normally achieved using a primary key, which is a field (or combination of fields) that uniquely identifies a record.
Example:
| student_id | name |
|---|---|
| 101 | Alice |
| 102 | Bob |
Here, student_id is the primary key.
Functional Dependency
A functional dependency occurs when the value of one attribute determines another.
Example:
student_id → student_name
This means that once the student ID is known, the student name can be determined.
Partial-Key Dependency
When a table has a composite key (a key made from multiple attributes), a partial dependency occurs when a column depends on only part of that key.
This indicates poor table design.
Transitive Dependency
A transitive dependency occurs when a non-key attribute depends on another non-key attribute rather than directly on the primary key.
This creates hidden relationships and redundancy.
Multi-Valued Dependency
A multi-valued dependency occurs when a key determines multiple independent values in another column.
Example:
A student may have multiple phone numbers or multiple skills.
These values should typically be stored in separate rows or tables.
Technical Explanation of Normal Forms
Normalization systematically improves a database by applying increasingly strict structural rules.
First Normal Form (1NF)
Conceptual Explanation
First Normal Form ensures that every field contains only a single value and that each row can be uniquely identified.
A table that stores lists or repeated fields is not properly structured for relational databases.
1NF solves this by removing repeating groups and multi-valued attributes.
Example: Violation of 1NF
| student_id | name | courses |
|---|---|---|
| 101 | Alice Lee | Math, Physics |
The column courses contains multiple values in a single cell.
This violates atomicity.
Correct Structure in 1NF
| student_id | name | course |
|---|---|---|
| 101 | Alice Lee | Math |
| 101 | Alice Lee | Physics |
Now every field contains one atomic value.
Technical Definition
A relation is in First Normal Form (1NF) if:
- All attributes contain atomic values
- Each tuple (row) is uniquely identifiable
- There are no repeating groups or multi-valued attributes
Second Normal Form (2NF)
Conceptual Explanation
Second Normal Form addresses problems that occur when a table uses a composite primary key.
If some columns depend on only part of the key, then data becomes duplicated unnecessarily.
2NF eliminates these partial-key dependencies.
Example: Violation of 2NF
| student_id | course_code | student_name |
|---|---|---|
| 101 | CS101 | Alice Lee |
The primary key is:
(student_id, course_code)
However:
student_name depends only on student_id
This creates a partial dependency.
Correct Design
Split the table into two separate tables.
Students
| student_id | student_name |
|---|---|
| 101 | Alice Lee |
Enrollments
| student_id | course_code |
|---|---|
| 101 | CS101 |
Now every non-key attribute depends on the entire key.
Technical Definition
A table is in Second Normal Form (2NF) if:
- It is already in 1NF, and
- Every non-key attribute is fully functionally dependent on the entire primary key
2NF therefore removes partial dependencies that occur in tables with composite keys.
Third Normal Form (3NF)
Conceptual Explanation
Third Normal Form removes hidden dependencies between non-key columns.
Sometimes a column depends on another column rather than directly on the primary key.
This creates transitive dependencies, which lead to redundancy and update anomalies.
Example: Violation of 3NF
| student_id | advisor_name | advisor_office |
|---|---|---|
| 101 | Dr. Smith | Room 204 |
Here:
student_id → advisor_name
advisor_name → advisor_office
This means:
advisor_office depends on advisor_name
not directly on student_id.
Correct Design
Create a separate table for advisors.
Advisors
| advisor_name | advisor_office |
|---|---|
| Dr. Smith | Room 204 |
Students
| student_id | advisor_name |
|---|---|
| 101 | Dr. Smith |
Now each attribute depends only on the primary key of its table.
Technical Definition
A table is in Third Normal Form (3NF) if:
- It is already in 2NF, and
- There are no transitive dependencies
In other words:
Non-key attributes depend only on the key.
Comparison of the Normal Forms
| Normal Form | Main Rule | Problem Solved |
|---|---|---|
| 1NF | Fields must contain atomic values | Repeating groups and multi-valued fields |
| 2NF | Non-key attributes depend on the entire key | Partial-key dependencies |
| 3NF | Non-key attributes depend only on the key | Transitive dependencies |
Why Normalization Matters
Normalization helps prevent several common problems in poorly designed databases.
Data Duplication
Storing the same information in multiple places increases storage use and introduces inconsistencies.
Normalization reduces duplication.
Missing Data
When tables are poorly structured, many fields may contain NULL values.
Normalization separates optional information into different tables.
Update Anomalies
If duplicated data must be updated in many places, inconsistencies may occur.
Normalization ensures data is stored in one authoritative location.
Insert Anomalies
Poorly structured tables may require unrelated data before a new record can be added.
Normalization removes unnecessary dependencies.
Delete Anomalies
Deleting a record may accidentally remove important information.
Normalization separates entities so information is not lost unintentionally.
Conclusion
Normalization is a systematic process used to improve relational database design.
By applying the rules of First, Second, and Third Normal Forms, database designers ensure that:
- Data is stored in atomic fields
- Attributes depend on the correct primary keys
- Hidden relationships between non-key attributes are removed
The result is a database that is consistent, maintainable, and logically structured.
While additional normal forms exist (such as Boyce–Codd Normal Form and Fourth Normal Form), the progression from 1NF to 3NF provides the essential foundation for relational database design.
For IB Computer Science, understanding these forms allows students to explain how normalization reduces redundancy, eliminates dependency problems, and improves data integrity