How dict gets constant-time lookup

Hashability, collisions, load factor, memory cost, and stable key order

Python's dict is implemented as a compact hash table (Python 3.6+) that splits indices and entries into separate arrays. The indices array is sparse, storing hash-derived positions. The entries array is dense, storing key-value pairs in insertion order. This split layout saves about 58% memory compared to the old combined-table design. Insertion order is a language guarantee since Python 3.7 (PEP 468). Keys must be hashable: they implement `__hash__` and `__eq__` consistently. Collisions are resolved through perturbed pseudo-random probing. The load factor triggers resize at 2/3 occupancy. Key-sharing dictionaries (PEP 412) optimize instance `__dict__` by sharing the keys object across instances of the same class. <a href="/dict-setdefault">Use setdefault and defaultdict for common dict patterns</a>. <a href="/sets-membership-views">Compare dict layout with set hash tables</a>.

Understand.
Visualize.
Master.

Python in Depth

An interactive engineering reference for Python internals

Quick note

Hashability is required.

:)
TABLE OF CONTENTS
3.1How dict gets constant-time lookup

Hashability, collisions, load factor, memory cost, and stable key order

Dictionaries are Python's general-purpose hash table. They trade memory for fast keyed access, and they also underpin module globals, class namespaces, many instance attribute dictionaries, and many attribute lookup paths. Not every obj.attr access or function call performs a dict lookup on the hot path — descriptors, __slots__, locals, and CPython's inline caches can avoid that — but dicts are still one of the core structures behind Python's namespace model.

Think of a dict like a library card catalog. You look up a title (hash the key), find the exact shelf location, and walk straight to it. No scanning every book. The cost is the catalog itself — the hash table reserves extra space to keep lookups fast, which is why a dict with 10,000 entries uses more memory than a list with 10,000 entries.

Core answer

A dict key must be hashable: its hash must be consistent with equality, and whatever data determines that hash must not change while the object is used as a key.

# [CURRENT - 3.10-3.14] Works on Python 3.x
good = {
("region", "BR"): "active",
42: "answer",
frozenset({"read", "write"}): "role",
}
try:
bad = {["region", "BR"]: "active"}
except TypeError as exc:
print(exc)

Average-case lookup is effectively constant time because Python computes the hash and probes the hash table directly. Worst-case behavior happens when many keys collide (bad hash function), and memory overhead grows with the sparse hash table — not a problem, just a tradeoff to know.

Step Through a Dict Lookup

Follow a conceptual hash-table lookup from home bucket to collision handling to the final matched value.

Mechanism and internals

The portable rule from the Python data model is:

  • if a == b, then hash(a) == hash(b) must also hold
  • if __eq__ is defined but __hash__ is not, the class becomes unhashable (Python sets __hash__ to None)
# [CURRENT - 3.10-3.14] Works on Python 3.x
class Hashable:
def __init__(self, value):
self.value = value
def __eq__(self, other):
return isinstance(other, Hashable) and self.value == other.value
def __hash__(self):
return hash(self.value)
class Unhashable:
def __init__(self, value):
self.value = value
def __eq__(self, other):
return isinstance(other, Unhashable) and self.value == other.value
try:
{Unhashable(1): "will fail"}
except TypeError as exc:
print(exc)

A bad hash function does not break correctness, but it destroys performance. hash(1) == hash(1.0) == hash(True) all produce the same hash because Python's int, float, and bool types are all part of the numeric tower and compare equal across types. That means {1: "int", 1.0: "float"} has only one key — the second insertion overwrites the first.

# [CURRENT - 3.10-3.14] Works on Python 3.x
d = {}
d[1] = "int"
d[1.0] = "float"
print(d) # {1: 'float'}

This follows from the numeric-comparison contract in the data model: two values that compare equal must share the same dict slot.

CPython internals

CPython 3.12 implements dict using a compact, ordered hash table with two arrays per keys object, as described in Objects/dictobject.c (lines 30-60). The layout separates the hash table itself from the key-value storage:

  • dk_indices — a sparse array of signed integers. Each slot holds an index into the entries array (>= 0), or DKIX_EMPTY (-1) for unused slots, or DKIX_DUMMY (-2) for slots whose key was deleted. The slot width grows from 1 byte to 4 or 8 bytes as the table size increases.
  • dk_entries — a dense array of PyDictKeyEntry or PyDictUnicodeEntry structs, each holding a key, a value, and a hash (for non-unicode keys). Entries are stored in insertion order.

The lookup works in two steps. Python hashes the key, takes the low bits of the hash to find an index in dk_indices, then follows that index to the matching entry in dk_entries. If the index is DKIX_EMPTY or the entry does not match, a probe sequence begins.

The probing algorithm (documented at Objects/dictobject.c, lines 280-340) uses a perturbed pseudo-random probe. The initial slot is hash & mask. On collision, the next slot is computed as:

j = (5 * j) + 1 + perturb
perturb >>= PERTURB_SHIFT # PERTURB_SHIFT = 5
j = j & mask

This is neither linear probing nor quadratic probing. It is a specific design chosen because:

  • consecutive integer hash codes (0, 1, 2, 3, ...) map to consecutive slots on the first try, giving perfect hit rates for contiguous integer keys
  • the perturb variable brings the high bits of the hash into play on each successive miss, so that keys with the same home bucket spread apart quickly rather than clustering

The USABLE_FRACTION macro (Objects/dictobject.c, line 403) sets the maximum load:

#define USABLE_FRACTION(n) (((n) << 1) / 3)

That is exactly 2/3. A dict with dk_size = 8 can hold USABLE_FRACTION(8) = 5 entries before resizing. The resize doubles the table (growth rule is GROWTH_RATE(d) = (d)->ma_used * 3, which forces the next power of two above used*3).

CPython also implements key-sharing dictionaries (PEP 412, Objects/dictobject.c lines 90-130). When instances of the same class define attributes in the same order (as in a typical __init__), they share the PyDictKeysObject — the dk_indices and dk_entries arrays — and each instance stores only its values in a separate ma_values array. This saves the keys-array overhead for every instance. The requirement is stable: define the same attributes in the same order in __init__. Adding attributes dynamically later forces the instance into a combined-table layout, losing the sharing optimization.

The compact dict was introduced in CPython 3.6 and became a language guarantee in Python 3.7. The change saved roughly 58% memory for a typical dict compared with the old combined-table layout, and it also added preservation of insertion order as a side effect of the dense-entries array.

Memory cost and growth

Dicts reserve significantly more memory than a plain sequence because the table must preserve fast lookups under growth. Local CPython 3.12.3, 64-bit measurements with sys.getsizeof show the growth steps clearly:

  • dict() -> 64 bytes
  • 1 to 5 items -> 224 bytes
  • 6 to 10 items -> 352 bytes
  • 16 items -> 632 bytes
  • 32 items -> 1168 bytes
# [CURRENT - 3.10-3.14] Works on Python 3.x
# Example byte counts below were measured on CPython 3.12.3, 64-bit Linux.
import sys
for n in (0, 1, 5, 6, 10, 16, 32):
d = {i: i for i in range(n)}
print(n, sys.getsizeof(d))

Those jumps correspond to the underlying dk_size doubling. The minimum size is PyDict_MINSIZE = 8 (at Objects/dictobject.c, line 160). A dict with one entry occupies a 8-slot table with 5 usable slots. The next resize happens at 6 entries, where the table doubles to 16 slots (10 usable), then at 11 entries to 32 slots (21 usable), and so on.

The exact thresholds are implementation detail, not a language guarantee. The tradeoff is stable: dicts spend memory on spare capacity to protect lookup speed.

Ordering and object dictionaries

Dictionary insertion order is guaranteed by the language in Python 3.7+. CPython 3.6 preserved it as an implementation detail, but Python 3.6 is End-of-Life and should not be treated as a production baseline.

# [OLDER / 3.7-3.8, CURRENT - 3.10-3.14] Works on Python 3.7+
config = {}
config["host"] = "localhost"
config["port"] = 5432
config["debug"] = False
print(list(config))

Dict internals also matter for objects. CPython uses key-sharing dictionaries for many instance __dict__ layouts when instances of a class have the same attribute names. Defining attributes consistently in __init__ helps preserve that memory optimization.

# [CURRENT - 3.10-3.14] Works on Python 3.x
class Person:
def __init__(self, name, age):
self.name = name
self.age = age

The optimization details are CPython-specific. The production lesson is stable: define instance attributes predictably instead of attaching random new fields later unless that flexibility is intentional.

Version context

Current project guidance targets Python 3.10-3.14. Insertion order is language-guaranteed in Python 3.7+. Python 3.9 and below are End-of-Life.

The exact byte counts in this guide are CPython 3.12.3, 64-bit measurements taken locally. Different interpreters and builds may differ.

Version-sensitive notes for dict internals:

  • compact dict (two-array layout): CPython 3.6+, language guarantee 3.7+
  • key-sharing dictionaries: PEP 412, CPython 3.3+
  • variable-size indices (int8/int16/int32/int64): CPython 3.6+
  • free-threaded builds (PEP 703, Python 3.13+) add atomic operations and lock-free lookup paths to dict, but the layout and probing algorithm remain the same
Edge cases and gotchas

Never mutate an object's equality-relevant fields after inserting it into a dict. The object may become unfindable because the table placement was based on the old hash/equality state.

# [CURRENT - 3.10-3.14] Works on Python 3.x
from dataclasses import dataclass
@dataclass(unsafe_hash=True)
class Point:
x: int
y: int
p = Point(1, 2)
points = {p: "origin-ish"}
p.x = 99
print(points.get(p))

Use immutable key objects. If the domain object is mutable, key by a stable identifier instead of the mutable object itself.

Another subtle collision: numeric tower equality means 1, 1.0, and True all map to the same dict slot. This is documented behavior (Python data model, numeric comparison rules) but often surprises new developers.

# [CURRENT - 3.10-3.14] Works on Python 3.x
d = {}
d[1] = "int"
d[1.0] = "float"
d[True] = "bool"
print(d) # one key remains
Production usage

Use dicts when keyed lookup is the real operation. Do not use them as a vague replacement for every record or every small container. For membership-only workloads, a set usually expresses intent better and can be cheaper because there are no stored values; see .

Use defaultdict or setdefault only when the "bucket per key" pattern is genuine; see .

For the common pattern of looking up a key with a fallback, dict.get(key, default) is the right tool. It avoids an explicit membership check followed by a separate lookup, which would probe the table twice:

# [CURRENT - 3.10-3.14] Works on Python 3.x
config = {"host": "localhost"}
# one probe
port = config.get("port", 5432)
# two probes — wasteful
if "port" in config:
port = config["port"]
else:
port = 5432
Further depth
  • Mapping types - dict
  • Data model: object.__hash__
  • What's New in Python 3.7
  • PEP 412: Key-Sharing Dictionary
  • CPython source: Objects/dictobject.c
  • CPython developer guide: dict implementation notes
  • sys.getsizeof
  • Mapping types: dict
MEASURED NOTEBOOKMeasured
Measured dict lookup versus linear scan

This notebook compares dictionary key lookup with scanning a list of key-value pairs. It shows which path is faster for hit and miss lookups and what each representation costs in container memory at 10k, 100k, and 1M keys.

Winnerdict lookup — 160676x faster than list scan @ 1M
RELATED GUIDE
Hit lookup time by mapping size
0.00 µs4296.4 µs8592.8 µs10k100k1M
dict lookup
list scan
CONTROLS
METRICS
Faster lookup pathdict lookup
Speed gap @ 1M hit160676x faster than list scan @ 1M
Memory cost @ 1Mdict 40960.1 KiB / list 7812.6 KiB
Largest mapping tested1,000,000 keys
NOTES

What this tests — key lookup in a dict (hash table) vs linear scan through a list of (key, value) pairs. Both data structures hold the same 1M entries. The chart shows how lookup time scales as the collection grows.

Why dict won — dict uses an open-addressed hash table. Computing the hash and probing a few slots is O(1) average case regardless of dict size. The list scan must call `__eq__` sequentially until it finds a match, making it O(n).

The surprise — the speed gap exceeds 160,676x at 1M. Most developers know “hash tables are faster,” but the magnitude is far larger than intuition suggests. For repeated lookups, the dict pays for its memory overhead within a handful of accesses.

TEST ENVIRONMENT
Python Version3.12.3
Machinex86_64
Contribute