How dict gets constant-time lookup

Hashability, collisions, load factor, memory cost, and stable key order

Python's dict is implemented as a compact hash table (Python 3.6+) that splits indices and entries into separate arrays. The indices array is sparse, storing hash-derived positions. The entries array is dense, storing key-value pairs in insertion order. This split layout saves about 58% memory compared to the old combined-table design. Insertion order is a language guarantee since Python 3.7 (PEP 468). Keys must be hashable: they implement `__hash__` and `__eq__` consistently. Collisions are resolved through perturbed pseudo-random probing. The load factor triggers resize at 2/3 occupancy. Key-sharing dictionaries (PEP 412) optimize instance `__dict__` by sharing the keys object across instances of the same class. <a href="/dict-setdefault">Use setdefault and defaultdict for common dict patterns</a>. <a href="/sets-membership-views">Compare dict layout with set hash tables</a>.

Understand.
Visualize.
Master.

Python in Depth

An interactive engineering reference for Python internals

Quick note

Hashability is required.

:)
Python version

Targets Python 3.10–3.14. Python 3.9 and below are End-of-Life.

TABLE OF CONTENTS
3.1How dict gets constant-time lookup

Hashability, collisions, load factor, memory cost, and stable key order

dict is Python's default indexed data structure because keyed lookup is expressive and usually fast. The speed comes from a sparse hash table, not from magic syntax.

Core answer

Use dictionaries when keys are the model. A key must be hashable, and equality-relevant state must stay stable for as long as the key lives in the table.

# [CURRENT - 3.10-3.14] Works on Python 3.10+
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class InvoiceKey:
tenant: str
number: str
def index_amounts(rows: list[tuple[InvoiceKey, int]]) -> dict[InvoiceKey, int]:
amounts: dict[InvoiceKey, int] = {}
for key, cents in rows:
amounts[key] = cents
return amounts
ledger = index_amounts([(InvoiceKey("TEN-1", "INV-7"), 4200)])
print(ledger[InvoiceKey("TEN-1", "INV-7")])
Step Through a Dict Lookup

Follow a conceptual hash-table lookup from home bucket to collision handling to the final matched value.

Why this design exists

Hash tables trade memory for average constant-time lookup and update. Python uses that tradeoff for mappings because namespaces, caches, configuration, and indexes all need stable key semantics. The object model connects the table to user types through __hash__ and __eq__.

PEP 412 matters for instance dictionaries: CPython can share key layouts across instances that define the same attributes consistently.

Mechanics and CPython internals

CPython dicts use a compact ordered layout with a sparse index table and dense entries. A lookup hashes the key, probes candidate slots, and uses equality only when hashes and occupancy make a comparison relevant. Deletions leave tombstone-like states until table maintenance; resizing preserves probe performance under growth. Insertion order is a language guarantee in Python 3.7+; compact layout details remain implementation territory.

# [CURRENT - 3.10-3.14] Works on Python 3.10+
from dataclasses import dataclass
from sys import getsizeof
@dataclass(frozen=True, slots=True)
class Region:
code: str
def report_growth(counts: tuple[int, ...]) -> None:
for count in counts:
table = {Region(str(index)): index for index in range(count)}
print(count, getsizeof(table), list(table)[:2])
report_growth((0, 1, 5, 6, 16, 32))

Complexity and tradeoffs

Lookup, insertion, and deletion are average O(1) under a healthy hash distribution. Worst-case collision behavior is worse, and a custom hash that collapses many keys destroys throughput even when correctness survives. The table spends extra memory on capacity and indexes to keep probe chains short.

Idiomatic patterns and refactoring

Prefer one lookup that states the fallback over a membership probe followed by a second keyed access.

# [CURRENT - 3.10-3.14] Works on Python 3.10+
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class EndpointConfig:
timeout_ms: int
def timeout_bad(raw: dict[str, EndpointConfig], name: str) -> int:
if name in raw:
return raw[name].timeout_ms
return 500
def timeout(raw: dict[str, EndpointConfig], name: str) -> int:
return raw.get(name, EndpointConfig(500)).timeout_ms
configs = {"billing": EndpointConfig(250)}
print(timeout_bad(configs, "billing"))
print(timeout(configs, "ledger"))

Common mistakes and edge cases

Do not mutate a hash-bearing key after insertion. Remember that values such as 1, 1.0, and True compare equal and therefore share mapping-key semantics. Do not assume insertion ordering makes a dict a sorted structure.

When to use / When NOT to use

Use a dict for keyed records, indexes, caches, counters, and namespaces. Use a list or tuple when positional order is the contract, and use a set when values are only membership keys.

Do not pick dict for "fast" by reflex when key construction, mutation discipline, or memory pressure is the actual problem.

Further reading

  • Official docs: mapping types
  • Official docs: __hash__
  • PEP 412: key-sharing dictionaries
  • CPython source: Objects/dictobject.c
MEASURED NOTEBOOKMeasured
Measured dict lookup versus linear scan

This notebook compares dictionary key lookup with scanning a list of key-value pairs. It shows which path is faster for hit and miss lookups and what each representation costs in container memory at 10k, 100k, and 1M keys.

Winnerdict lookup — 160676x faster than list scan @ 1M
RELATED GUIDE
Hit lookup time by mapping size
0.00 µs4296.4 µs8592.8 µs10k100k1M
dict lookup
list scan
CONTROLS
METRICS
Faster lookup pathdict lookup
Speed gap @ 1M hit160676x faster than list scan @ 1M
Memory cost @ 1Mdict 40960.1 KiB / list 7812.6 KiB
Largest mapping tested1,000,000 keys
NOTES

What this tests — key lookup in a dict (hash table) vs linear scan through a list of (key, value) pairs. Both data structures hold the same 1M entries. The chart shows how lookup time scales as the collection grows.

Why dict won — dict uses an open-addressed hash table. Computing the hash and probing a few slots is O(1) average case regardless of dict size. The list scan must call `__eq__` sequentially until it finds a match, making it O(n).

The surprise — the speed gap exceeds 160,676x at 1M. Most developers know “hash tables are faster,” but the magnitude is far larger than intuition suggests. For repeated lookups, the dict pays for its memory overhead within a handful of accesses.

TEST ENVIRONMENT
Python Version3.12.3
Machinex86_64
Contribute