Data Structures

Hash Table (Advanced)

Difficulty: Intermediate

Python's dict uses open addressing with a Robin Hood-inspired probe. Java's HashMap switches a bucket from a linked list to a red-black tree once chain length crosses eight. C++ std::unorderedmap is required by the standard to use...

INTERMEDIATE

Complexity varies

Hash Table (Advanced)

Python's dict uses open addressing with a Robin Hood-inspired probe. Java's HashMap switches a bucket from a linked list to a red-black tree once chain length crosses eight. C++ std::unordered_map is required by the standard to use chaining. Three production hash tables, three different choices, all defending the same goal: keep expected O(1) lookups even as the table fills up. Hash Table (Advanced) unpacks the engineering decisions behind those choices.

This lesson covers hash function design (division method, multiplication method, universal hashing), the two collision-resolution families (separate chaining and open addressing with linear, quadratic, or double hashing), the load-factor threshold that triggers a rehash, and the amortized O(1) cost of dynamic resizing. You will also see how tombstones make deletion correct under open addressing, and why Robin Hood and cuckoo hashing tame worst-case probe lengths.

In Hash Map (Dictionary) Basics, you used a hash map as a black box; this lesson opens the box and replaces the hand-wave with a quantitative model. Modular Arithmetic is the workhorse behind every practical hash function: the key % m step that turns an arbitrary integer into a valid bucket index, and the (h(k) + i*step) % m formula behind double hashing.

Next, Skip List answers the same fast-lookup question with a probabilistic, pointer-based design that has no hash function at all and no rehashing step.

Intermediate

55 min

6 Objectives

6 Sections

Prerequisites:

Hash Map (Dictionary) Basics Modular Arithmetic

Topics:

Hash Table

Open Addressing

Hashing

Hash Functions

Collisions

Rehashing

Load Factor

Data Structures

Intermediate

Premium

What You'll Learn

By the end of this lesson, you will be able to:

Design hash functions using the division method, multiplication method, and universal hashing, and explain what makes a hash function 'good'.

Implement open addressing with linear probing, quadratic probing, and double hashing, and handle deletions correctly using tombstones.

Explain Robin Hood hashing and cuckoo hashing at a conceptual level and describe why they reduce worst-case probe lengths.

Analyze expected probe lengths under different load factors for both chaining and open addressing.

Implement dynamic resizing (rehashing) and explain its amortized O(1) cost.

Compare chaining vs open addressing trade-offs and identify which strategy real-world implementations use.

Why This Matters

Understanding hash table internals lets you predict when O(1) average-case performance degrades and choose the right collision strategy for your workload -- a skill tested in system design interviews at top companies.

Real-world hash tables use sophisticated open addressing with Robin Hood hashing (Python dict), red-black tree chaining (Java HashMap), and power-of-two bucket counts (C++ unordered_map). Knowing these internals helps you tune performance and debug subtle hash-related bugs.

Collision resolution analysis (expected probe lengths, load factor thresholds, amortized rehashing cost) connects directly to the amortized analysis and modular arithmetic you learned in Foundations -- this lesson applies those tools to a concrete, high-impact data structure.

Key Terms

8 terms you'll encounter in this lesson

Open Addressing

A collision resolution family where all elements are stored directly in the hash table array. When a collision occurs, the algorithm probes subsequent slots according to a probing sequence until an empty slot is found.

Linear Probing

An open addressing strategy where, on collision at index h, slots h+1, h+2, h+3, ... are checked sequentially (modulo capacity). Simple but suffers from primary clustering.

Quadratic Probing

An open addressing strategy where probe offsets grow quadratically: h+1, h+4, h+9, ... (h + i^2 mod capacity). Reduces primary clustering but can suffer from secondary clustering.

Double Hashing

An open addressing strategy that uses a second hash function to determine the probe step size: h(k) + i * h2(k). Eliminates both primary and secondary clustering.

Tombstone

A special marker placed in a deleted slot during open addressing. It signals 'skip me during search, but you can overwrite me during insert'. Without tombstones, deletions would break probe chains.

Robin Hood Hashing

A variation of linear probing where, during insertion, if the new key has traveled farther from its home slot than the existing key, they swap places. This equalizes probe distances and reduces worst-case lookups.

Cuckoo Hashing

A scheme using two (or more) hash functions and two tables. Each key has exactly two possible positions. On collision, the existing key is 'kicked out' to its alternative position, guaranteeing O(1) worst-case lookup.

Universal Hashing

A family of hash functions chosen randomly at construction time such that for any two distinct keys, the probability of collision is at most 1/m (where m is the table size). This provides worst-case expected O(1) performance regardless of input distribution.

Heads Up

Common misconceptions to watch out for

"Linear probing is always bad because of clustering"

Linear probing is actually cache-friendly because it accesses consecutive memory locations. At low load factors (below 0.5), it often outperforms chaining due to fewer cache misses. The clustering problem only becomes severe at high load factors.

"You can just delete entries in open addressing like you do with chaining"

In open addressing, simply clearing a slot breaks the probe chain for other keys that were inserted past that slot. You must use a tombstone (a sentinel marker) to indicate 'deleted but keep probing'. Tombstones are later cleaned up during rehashing.

"Quadratic probing always visits every slot"

Quadratic probing is only guaranteed to visit every slot when the table size is prime and the load factor is below 0.5. With non-prime sizes or high load factors, quadratic probing may cycle without visiting all slots, causing insertion failure even when empty slots exist.

"A bigger table always means faster lookups"

Making the table too large wastes memory and can actually hurt cache performance. The optimal approach is to maintain a load factor between 0.25 and 0.75 and resize dynamically. The amortized cost of resizing is O(1) per operation.