B-trees | Computer Science KB

Big Idea

A B-tree is a self-balancing search tree designed for systems that read and write large blocks of data — especially databases and file systems. Instead of storing one key per node like a binary search tree, a B-tree node holds many keys and many children, which keeps the tree very shallow and reduces disk access. The whole design exists because disks (and even SSDs) are slow compared to memory, so minimizing the number of node visits matters more than minimizing comparisons.

Think of it as a multi-lane search tree. Each node contains a sorted list of keys:

[10 | 20 | 35]
 /     |     \
<10  10–20 20–35  >35

Searching works by scanning the keys inside a node to decide which child pointer to follow. Because nodes are wide, the tree height stays small — often only 2–4 levels even for millions of records.

Here are the core properties that make a B-tree a B-tree:

A B-tree has an order (sometimes called degree). This defines the maximum number of children a node may have.
All leaves appear at the same depth, so the structure stays balanced automatically.
Nodes (except the root) must stay at least half full, which prevents degeneration into a skinny tree.
Insertion and deletion may cause nodes to split or merge to maintain balance.

A small conceptual example:

Start with inserting numbers into an order-3 B-tree.

Insert 10 → [10]
Insert 20 → [10 | 20]
Insert 30 → node overflows → split → middle value moves up

Result:

      [20]
     /    \
   [10]  [30]

Why databases love B-trees:

Disk pages are large blocks (often 4KB or more). A single B-tree node is sized to fit exactly into one page. That means one disk read retrieves dozens or hundreds of keys at once. Binary trees would require far more page reads, which would feel like searching a library by opening one page of a book at a time.

A few related variants you’ll see in database engines:

B+-trees — internal nodes store only keys; all actual data lives in linked leaves. Most SQL indexes use this.
B-trees* — delay splits by redistributing keys with siblings to improve space usage.
LSM trees — not technically B-trees, but often contrasted with them in modern storage engines.

One way to frame it : a binary search tree optimizes comparisons; a B-tree optimizes I/O locality. That shift in priorities is why nearly every relational database index under the hood ends up looking like some flavor of B-tree rather than a classic CS textbook tree.