A3.2.5 Explain the difference between normal forms.

A3.2.5 Explain the difference between normal forms. 
• First normal form (1NF), second normal form (2NF), third normal form (3NF) 
• The terms atomicity, unique identification, functional dependencies, partial-key dependencies, non-key/transitive dependencies 
• Normalization issues can encompass data duplication, missing data, and a range of dependency concerns, including data dependencies, composite key dependencies, transitive dependencies, and multi-valued dependencies.

 

Big Idea

When designing a database, one of the most important goals is to store data in a clean and reliable structure. Poor database design often leads to problems such as:

  • The same information being stored in many places
  • Missing or incomplete data
  • Confusing relationships between pieces of information
  • Errors when updating or deleting records

Normalization is the process used to organize a relational database so that these problems are minimized.

Normalization works by applying a series of rules called normal forms. Each normal form identifies a common design flaw and restructures the database to eliminate it.

The three most important normal forms are:

  • First Normal Form (1NF) – ensures each field contains only one value.
  • Second Normal Form (2NF) – ensures every column depends on the entire primary key.
  • Third Normal Form (3NF) – ensures columns depend only on the key and not on other non-key columns.

By progressing through these stages, a database becomes more logically structured, easier to maintain, and less prone to errors.

For IB Computer Science, you are expected to explain the differences between these normal forms and understand the types of problems each one prevents.


Key Concepts

Before studying the normal forms themselves, several important concepts must be understood.

Atomicity

A field should contain a single, indivisible value.

For example, storing this value:

Math, Physics, Chemistry

inside a single cell is problematic because the database cannot easily treat each subject separately.

Instead, each value should appear in its own row.


Unique Identification

Every row in a table must represent exactly one unique entity.

This is normally achieved using a primary key, which is a field (or combination of fields) that uniquely identifies a record.

Example:

student_idname
101Alice
102Bob

Here, student_id is the primary key.


Functional Dependency

A functional dependency occurs when the value of one attribute determines another.

Example:

student_id → student_name

This means that once the student ID is known, the student name can be determined.


Partial-Key Dependency

When a table has a composite key (a key made from multiple attributes), a partial dependency occurs when a column depends on only part of that key.

This indicates poor table design.


Transitive Dependency

A transitive dependency occurs when a non-key attribute depends on another non-key attribute rather than directly on the primary key.

This creates hidden relationships and redundancy.


Multi-Valued Dependency

A multi-valued dependency occurs when a key determines multiple independent values in another column.

Example:

A student may have multiple phone numbers or multiple skills.

These values should typically be stored in separate rows or tables.


Technical Explanation of Normal Forms

Normalization systematically improves a database by applying increasingly strict structural rules.


First Normal Form (1NF)

Conceptual Explanation

First Normal Form ensures that every field contains only a single value and that each row can be uniquely identified.

A table that stores lists or repeated fields is not properly structured for relational databases.

1NF solves this by removing repeating groups and multi-valued attributes.


Example: Violation of 1NF

student_idnamecourses
101Alice LeeMath, Physics

The column courses contains multiple values in a single cell.

This violates atomicity.


Correct Structure in 1NF

student_idnamecourse
101Alice LeeMath
101Alice LeePhysics

Now every field contains one atomic value.


Technical Definition

A relation is in First Normal Form (1NF) if:

  • All attributes contain atomic values
  • Each tuple (row) is uniquely identifiable
  • There are no repeating groups or multi-valued attributes

Second Normal Form (2NF)

Conceptual Explanation

Second Normal Form addresses problems that occur when a table uses a composite primary key.

If some columns depend on only part of the key, then data becomes duplicated unnecessarily.

2NF eliminates these partial-key dependencies.


Example: Violation of 2NF

student_idcourse_codestudent_name
101CS101Alice Lee

The primary key is:

(student_id, course_code)

However:

student_name depends only on student_id

This creates a partial dependency.


Correct Design

Split the table into two separate tables.

Students

student_idstudent_name
101Alice Lee

Enrollments

student_idcourse_code
101CS101

Now every non-key attribute depends on the entire key.


Technical Definition

A table is in Second Normal Form (2NF) if:

  • It is already in 1NF, and
  • Every non-key attribute is fully functionally dependent on the entire primary key

2NF therefore removes partial dependencies that occur in tables with composite keys.


Third Normal Form (3NF)

Conceptual Explanation

Third Normal Form removes hidden dependencies between non-key columns.

Sometimes a column depends on another column rather than directly on the primary key.

This creates transitive dependencies, which lead to redundancy and update anomalies.


Example: Violation of 3NF

student_idadvisor_nameadvisor_office
101Dr. SmithRoom 204

Here:

student_id → advisor_name
advisor_name → advisor_office

This means:

advisor_office depends on advisor_name

not directly on student_id.


Correct Design

Create a separate table for advisors.

Advisors

advisor_nameadvisor_office
Dr. SmithRoom 204

Students

student_idadvisor_name
101Dr. Smith

Now each attribute depends only on the primary key of its table.


Technical Definition

A table is in Third Normal Form (3NF) if:

  • It is already in 2NF, and
  • There are no transitive dependencies

In other words:

Non-key attributes depend only on the key.

Comparison of the Normal Forms

Normal FormMain RuleProblem Solved
1NFFields must contain atomic valuesRepeating groups and multi-valued fields
2NFNon-key attributes depend on the entire keyPartial-key dependencies
3NFNon-key attributes depend only on the keyTransitive dependencies

Why Normalization Matters

Normalization helps prevent several common problems in poorly designed databases.

Data Duplication

Storing the same information in multiple places increases storage use and introduces inconsistencies.

Normalization reduces duplication.


Missing Data

When tables are poorly structured, many fields may contain NULL values.

Normalization separates optional information into different tables.


Update Anomalies

If duplicated data must be updated in many places, inconsistencies may occur.

Normalization ensures data is stored in one authoritative location.


Insert Anomalies

Poorly structured tables may require unrelated data before a new record can be added.

Normalization removes unnecessary dependencies.


Delete Anomalies

Deleting a record may accidentally remove important information.

Normalization separates entities so information is not lost unintentionally.


Conclusion

Normalization is a systematic process used to improve relational database design.

By applying the rules of First, Second, and Third Normal Forms, database designers ensure that:

  • Data is stored in atomic fields
  • Attributes depend on the correct primary keys
  • Hidden relationships between non-key attributes are removed

The result is a database that is consistent, maintainable, and logically structured.

While additional normal forms exist (such as Boyce–Codd Normal Form and Fourth Normal Form), the progression from 1NF to 3NF provides the essential foundation for relational database design.

For IB Computer Science, understanding these forms allows students to explain how normalization reduces redundancy, eliminates dependency problems, and improves data integrity