Database Indexes

This article is not assessed by the IB but may be helpful to deepen your understanding. Plus, I think it's cool.

Understanding Database Indexes

The Big Idea

A database index is a data structure that improves the speed of data retrieval operations on a table at the cost of additional storage and maintenance overhead. Think of an index like the index in a book: instead of reading every page to find a topic, you jump straight to the correct page using the index.

Indexes are essential to performance optimization in relational databases. Without them, every query involving a WHERE, JOIN, or ORDER BY clause might require a full table scan, which is computationally expensive for large datasets.


1. What Is an Index in a Database?

An index is typically created on one or more columns of a table. It creates a separate lookup structure—usually a B-tree or hash table—that maps key values to the physical locations of rows.

Analogy

If a table is a giant phone book, then an index is a sorted list of last names that lets you binary search instead of reading every entry.


2. Types of Indexes

a. Single-Column Index

Created on one column of a table.

CREATE INDEX idx_lastname ON employees(last_name);

b. Composite Index

Created on two or more columns. Useful for queries involving multiple columns in the WHERE clause.

CREATE INDEX idx_name_dept ON employees(last_name, department_id);

c. Unique Index

Enforces uniqueness on a column or combination of columns.

CREATE UNIQUE INDEX idx_email ON users(email);

d. Full-Text Index

Used for searching textual content, such as in search engines or document databases.

e. Hash Index (engine-dependent)

Optimized for exact match queries, not for range queries.


3. How Indexes Improve Query Performance

Example Query Without Index

SELECT * FROM users WHERE email = '[email protected]';
  • Without an index: the database checks every row in the table (full table scan).
  • With an index on <strong>email</strong>: the database quickly finds the row using the index.

Operations that Benefit from Indexes

  • WHERE clause conditions
  • JOIN conditions
  • ORDER BY clauses
  • GROUP BY clauses
  • Searching for ranges using BETWEEN, >, <

4. How Indexes Affect Performance

OperationEffect of Index
SELECTMuch faster, especially on large tables
INSERTSlightly slower (must update index as well as data)
UPDATE (on key)May be significantly slower—index must reflect value change
DELETEAlso requires index maintenance

Indexes trade faster reads for slower writes and more disk usage.


5. Drawbacks of Indexes

  • Storage overhead: each index consumes space.
  • Slower write performance: every INSERT, UPDATE, or DELETE must also update affected indexes.
  • Maintenance required: fragmented or heavily updated indexes may need rebuilding.
  • Too many indexes can lead to query planner confusion and overall slower performance.

6. Best Practices for Using Indexes

  • Index columns that appear in <strong>WHERE</strong>, <strong>JOIN</strong>, or <strong>ORDER BY</strong> clauses frequently.
  • Use composite indexes for multi-column filters—order of columns matters.
  • Avoid indexing columns with high volatility (frequent updates).
  • Monitor index usage: unused indexes should be dropped.
  • Use <strong>EXPLAIN</strong> or <strong>QUERY PLAN</strong> tools to understand how queries are using indexes.

7. Index Fragmentation and Rebuilding

Over time, frequent updates and deletes can cause fragmentation in the index structure, degrading performance.

  • Rebuild: recreates the entire index (more expensive but more thorough).
  • Reorganize: defragments the index in-place (less costly, good for minor fragmentation).

Examples:

-- SQL Server syntax
ALTER INDEX idx_lastname ON employees REBUILD;
ALTER INDEX idx_lastname ON employees REORGANIZE;

Conclusion

A database index is a critical tool for ensuring fast and efficient data retrieval in relational databases. When used correctly, indexes greatly enhance query performance, especially in large datasets. However, they come with trade-offs in write speed, storage, and maintenance.

To optimize database performance:

  • Design indexes deliberately based on query patterns,
  • Avoid over-indexing,
  • Maintain indexes with routine analysis and tuning.

Understanding how indexes work is foundational to building scalable and responsive database systems.