Dictionaries are Python's general-purpose hash table. They trade memory for fast keyed access, and they also underpin module globals, class namespaces, many instance attribute dictionaries, and many attribute lookup paths. Not every obj.attr access or function call performs a dict lookup on the hot path — descriptors, __slots__, locals, and CPython's inline caches can avoid that — but dicts are still one of the core structures behind Python's namespace model.
Think of a dict like a library card catalog. You look up a title (hash the key), find the exact shelf location, and walk straight to it. No scanning every book. The cost is the catalog itself — the hash table reserves extra space to keep lookups fast, which is why a dict with 10,000 entries uses more memory than a list with 10,000 entries.
A dict key must be hashable: its hash must be consistent with equality, and whatever data determines that hash must not change while the object is used as a key.
# [CURRENT - 3.10-3.14] Works on Python 3.xgood = { ("region", "BR"): "active", 42: "answer", frozenset({"read", "write"}): "role",}try: bad = {["region", "BR"]: "active"}except TypeError as exc: print(exc)Average-case lookup is effectively constant time because Python computes the hash and probes the hash table directly. Worst-case behavior happens when many keys collide (bad hash function), and memory overhead grows with the sparse hash table — not a problem, just a tradeoff to know.
The portable rule from the Python data model is:
- if
a == b, thenhash(a) == hash(b)must also hold - if
__eq__is defined but__hash__is not, the class becomes unhashable (Python sets__hash__toNone)
# [CURRENT - 3.10-3.14] Works on Python 3.xclass Hashable: def __init__(self, value): self.value = value def __eq__(self, other): return isinstance(other, Hashable) and self.value == other.value def __hash__(self): return hash(self.value)class Unhashable: def __init__(self, value): self.value = value def __eq__(self, other): return isinstance(other, Unhashable) and self.value == other.valuetry: {Unhashable(1): "will fail"}except TypeError as exc: print(exc)A bad hash function does not break correctness, but it destroys performance. hash(1) == hash(1.0) == hash(True) all produce the same hash because Python's int, float, and bool types are all part of the numeric tower and compare equal across types. That means {1: "int", 1.0: "float"} has only one key — the second insertion overwrites the first.
# [CURRENT - 3.10-3.14] Works on Python 3.xd = {}d[1] = "int"d[1.0] = "float"print(d) # {1: 'float'}This follows from the numeric-comparison contract in the data model: two values that compare equal must share the same dict slot.
CPython 3.12 implements dict using a compact, ordered hash table with two arrays per keys object, as described in Objects/dictobject.c (lines 30-60). The layout separates the hash table itself from the key-value storage:
dk_indices— a sparse array of signed integers. Each slot holds an index into the entries array (>= 0), orDKIX_EMPTY(-1) for unused slots, orDKIX_DUMMY(-2) for slots whose key was deleted. The slot width grows from 1 byte to 4 or 8 bytes as the table size increases.dk_entries— a dense array ofPyDictKeyEntryorPyDictUnicodeEntrystructs, each holding a key, a value, and a hash (for non-unicode keys). Entries are stored in insertion order.
The lookup works in two steps. Python hashes the key, takes the low bits of the hash to find an index in dk_indices, then follows that index to the matching entry in dk_entries. If the index is DKIX_EMPTY or the entry does not match, a probe sequence begins.
The probing algorithm (documented at Objects/dictobject.c, lines 280-340) uses a perturbed pseudo-random probe. The initial slot is hash & mask. On collision, the next slot is computed as:
j = (5 * j) + 1 + perturbperturb >>= PERTURB_SHIFT # PERTURB_SHIFT = 5j = j & maskThis is neither linear probing nor quadratic probing. It is a specific design chosen because:
- consecutive integer hash codes (
0, 1, 2, 3, ...) map to consecutive slots on the first try, giving perfect hit rates for contiguous integer keys - the
perturbvariable brings the high bits of the hash into play on each successive miss, so that keys with the same home bucket spread apart quickly rather than clustering
The USABLE_FRACTION macro (Objects/dictobject.c, line 403) sets the maximum load:
#define USABLE_FRACTION(n) (((n) << 1) / 3)That is exactly 2/3. A dict with dk_size = 8 can hold USABLE_FRACTION(8) = 5 entries before resizing. The resize doubles the table (growth rule is GROWTH_RATE(d) = (d)->ma_used * 3, which forces the next power of two above used*3).
CPython also implements key-sharing dictionaries (PEP 412, Objects/dictobject.c lines 90-130). When instances of the same class define attributes in the same order (as in a typical __init__), they share the PyDictKeysObject — the dk_indices and dk_entries arrays — and each instance stores only its values in a separate ma_values array. This saves the keys-array overhead for every instance. The requirement is stable: define the same attributes in the same order in __init__. Adding attributes dynamically later forces the instance into a combined-table layout, losing the sharing optimization.
The compact dict was introduced in CPython 3.6 and became a language guarantee in Python 3.7. The change saved roughly 58% memory for a typical dict compared with the old combined-table layout, and it also added preservation of insertion order as a side effect of the dense-entries array.
Dicts reserve significantly more memory than a plain sequence because the table must preserve fast lookups under growth. Local CPython 3.12.3, 64-bit measurements with sys.getsizeof show the growth steps clearly:
dict()->64bytes1to5items ->224bytes6to10items ->352bytes16items ->632bytes32items ->1168bytes
# [CURRENT - 3.10-3.14] Works on Python 3.x# Example byte counts below were measured on CPython 3.12.3, 64-bit Linux.import sysfor n in (0, 1, 5, 6, 10, 16, 32): d = {i: i for i in range(n)} print(n, sys.getsizeof(d))Those jumps correspond to the underlying dk_size doubling. The minimum size is PyDict_MINSIZE = 8 (at Objects/dictobject.c, line 160). A dict with one entry occupies a 8-slot table with 5 usable slots. The next resize happens at 6 entries, where the table doubles to 16 slots (10 usable), then at 11 entries to 32 slots (21 usable), and so on.
The exact thresholds are implementation detail, not a language guarantee. The tradeoff is stable: dicts spend memory on spare capacity to protect lookup speed.
Dictionary insertion order is guaranteed by the language in Python 3.7+. CPython 3.6 preserved it as an implementation detail, but Python 3.6 is End-of-Life and should not be treated as a production baseline.
# [OLDER / 3.7-3.8, CURRENT - 3.10-3.14] Works on Python 3.7+config = {}config["host"] = "localhost"config["port"] = 5432config["debug"] = Falseprint(list(config))Dict internals also matter for objects. CPython uses key-sharing dictionaries for many instance __dict__ layouts when instances of a class have the same attribute names. Defining attributes consistently in __init__ helps preserve that memory optimization.
# [CURRENT - 3.10-3.14] Works on Python 3.xclass Person: def __init__(self, name, age): self.name = name self.age = ageThe optimization details are CPython-specific. The production lesson is stable: define instance attributes predictably instead of attaching random new fields later unless that flexibility is intentional.
Current project guidance targets Python 3.10-3.14. Insertion order is language-guaranteed in Python 3.7+. Python 3.9 and below are End-of-Life.
The exact byte counts in this guide are CPython 3.12.3, 64-bit measurements taken locally. Different interpreters and builds may differ.
Version-sensitive notes for dict internals:
- compact dict (two-array layout): CPython 3.6+, language guarantee 3.7+
- key-sharing dictionaries: PEP 412, CPython 3.3+
- variable-size indices (int8/int16/int32/int64): CPython 3.6+
- free-threaded builds (PEP 703, Python 3.13+) add atomic operations and lock-free lookup paths to dict, but the layout and probing algorithm remain the same
Never mutate an object's equality-relevant fields after inserting it into a dict. The object may become unfindable because the table placement was based on the old hash/equality state.
# [CURRENT - 3.10-3.14] Works on Python 3.xfrom dataclasses import dataclass@dataclass(unsafe_hash=True)class Point: x: int y: intp = Point(1, 2)points = {p: "origin-ish"}p.x = 99print(points.get(p))Use immutable key objects. If the domain object is mutable, key by a stable identifier instead of the mutable object itself.
Another subtle collision: numeric tower equality means 1, 1.0, and True all map to the same dict slot. This is documented behavior (Python data model, numeric comparison rules) but often surprises new developers.
# [CURRENT - 3.10-3.14] Works on Python 3.xd = {}d[1] = "int"d[1.0] = "float"d[True] = "bool"print(d) # one key remainsUse dicts when keyed lookup is the real operation. Do not use them as a vague replacement for every record or every small container. For membership-only workloads, a set usually expresses intent better and can be cheaper because there are no stored values; see .
Use defaultdict or setdefault only when the "bucket per key" pattern is genuine; see .
For the common pattern of looking up a key with a fallback, dict.get(key, default) is the right tool. It avoids an explicit membership check followed by a separate lookup, which would probe the table twice:
# [CURRENT - 3.10-3.14] Works on Python 3.xconfig = {"host": "localhost"}# one probeport = config.get("port", 5432)# two probes — wastefulif "port" in config: port = config["port"]else: port = 5432