Data Structures

B-Trees and B+ Trees

Difficulty: Advanced

ADVANCED

Complexity varies

B-Trees and B+ Trees

A modern hard drive reads about 100 MB per second sequentially, but a single random seek still costs around 10 milliseconds; that gap is roughly seven orders of magnitude. A binary tree of a billion keys has 30 levels, which means 30 random seeks per lookup. B-Trees and B+ Trees keep each node fat enough to fill a disk page, branch into hundreds of children, and answer the same lookup in three or four seeks. That is why every relational database you have used relies on this family for its indexes.

This lesson covers the B-Tree minimum-degree parameter t and the resulting key-and-child counts at every node, the proactive split during insert that keeps the height bounded by O(log_t n), the merge-or-redistribute logic for delete, and the B+ Tree refinement that pushes all data to the leaf level and links those leaves together so range queries become a sequential scan after a single descent.

In Binary Search Tree (BST), every node held a single key and at most two children; that geometry is fine for memory but disastrous for disk because each comparison forces a separate I/O. B-Trees flip the design: keep every node wide so a single I/O resolves many comparisons at once.

Next, Splay Tree explores a third self-balancing strategy in which the tree reshapes itself around access patterns rather than maintaining strict structural invariants.

Advanced

70 min

6 Objectives

5 Sections

Prerequisites:

Binary Search Tree (BST)

Topics:

B-Tree

B+ Tree

Trees

Data Structures

Advanced

Premium

Deep Dive

Comparison

What You'll Learn

By the end of this lesson, you will be able to:

Explain why binary trees fail for disk-based storage and how B-Trees solve the I/O bottleneck.

Define B-Tree properties (minimum degree t, key capacity, all-leaves-same-depth) and trace a search through a B-Tree.

Implement B-Tree insertion with proactive node splitting.

Describe B+ Tree structure: internal index nodes, data-only leaves, and the linked leaf chain.

Compare B-Tree vs B+ Tree vs BST trade-offs and identify when each is appropriate.

Identify real-world B-Tree applications in databases (MySQL InnoDB, PostgreSQL) and file systems (NTFS, ext4).

Why This Matters

Every relational database you have ever used relies on B-Trees or B+ Trees for indexing. Understanding them reveals why CREATE INDEX makes queries fast and why full table scans are slow.

B-Trees solve a problem that binary search trees cannot: minimizing disk I/O. A B-Tree of order 1000 can index billions of records in just 3-4 levels, meaning 3-4 disk reads per lookup.

B+ Trees power range queries (WHERE price BETWEEN 10 AND 50) by linking leaf nodes into a sorted chain, enabling sequential scans without revisiting internal nodes.

This topic bridges data structures and systems design. Interviewers at companies like Google, Amazon, and Stripe expect you to explain how database indexes work under the hood.

Key Terms

7 terms you'll encounter in this lesson

B-Tree

A self-balancing multi-way search tree where each node can hold multiple sorted keys and have multiple children. All leaves are at the same depth, and nodes are kept between t-1 and 2t-1 keys (where t is the minimum degree).

Minimum Degree (t)

The parameter that controls a B-Tree's branching factor. Every non-root node has at least t-1 keys and at most 2t-1 keys. The root may have as few as 1 key.

B+ Tree

A variant of the B-Tree where all actual data records live in the leaf nodes, internal nodes serve only as an index, and leaf nodes are linked together in a sorted chain for efficient sequential access.

Node Splitting

The operation that occurs when a B-Tree node becomes full (2t-1 keys). The median key is promoted to the parent, and the node is split into two nodes each containing t-1 keys.

Leaf Chain (Linked Leaves)

In a B+ Tree, all leaf nodes are connected via next pointers forming a doubly or singly linked list. This enables efficient range queries by scanning leaves sequentially.

I/O Cost Model

A performance model for disk-based data structures where the dominant cost is the number of disk block reads/writes, not CPU comparisons. B-Trees minimize I/O by maximizing the branching factor.

Proactive Splitting

A B-Tree insertion strategy where full nodes are split on the way down (before reaching the leaf), ensuring we never need to backtrack up the tree. This allows single-pass, top-down insertion.

Heads Up

Common misconceptions to watch out for

"B-Trees are just binary trees with a different name"

B-Trees are multi-way trees. A B-Tree node can have dozens or hundreds of children, not just two. This high branching factor is what makes them so shallow and efficient for disk-based storage.

"B+ Trees store data in every node like B-Trees"

In a B+ Tree, internal nodes store only keys (acting as a routing index). All actual data records live exclusively in the leaf nodes. This means internal nodes can fit more keys, making the tree even shallower.

"B-Trees are slower than BSTs because each node has more keys to search"

In memory, yes, searching within a node takes O(t) comparisons. But for disk-based storage, the bottleneck is I/O, not comparisons. A single disk read loads an entire node. B-Trees minimize the number of disk reads (tree height), which dwarfs the within-node search cost.

"The 'B' in B-Tree stands for 'Binary'"

The exact meaning is debated -- it may stand for Balanced, Bayer (one of the inventors), Boeing (where Bayer worked), or Broad. It definitely does not stand for Binary.