Updating mutable values in dicts

Use setdefault or defaultdict to avoid repeated lookups and branch noise

The `if key not in d: d[key] = []` pattern performs two dict lookups. `setdefault` does it in one. But the default argument to `setdefault` is evaluated eagerly, before checking whether the key exists. For expensive defaults, that means wasted allocation on every existing-key access. `collections.defaultdict` solves this with lazy evaluation through `__missing__`, called only when the key is absent. However, `defaultdict` has its own subtlety: reading a missing key mutates the mapping. This matters when iterating or checking membership. The `get()`-then-assign pattern is an anti-pattern: it still performs two lookups. `defaultdict(list)` is usually the cleanest choice for grouping patterns. <a href="/dict-hash-tables">Learn how dict lookups work under the hood</a>. <a href="/memory-container-comparison">Compare dict memory costs with other containers</a>.

Understand.
Visualize.
Master.

Python in Depth

An interactive engineering reference for Python internals

Quick note

Auto-create only when the domain allows it.

:)
TABLE OF CONTENTS
3.2Updating mutable values in dicts

Use setdefault or defaultdict to avoid repeated lookups and branch noise

The recurring dict pattern is: find the bucket for a key, create it if missing, then mutate that bucket. Python gives you setdefault and defaultdict for this, but they work differently — one evaluates its default eagerly and the other lazily.

Think of setdefault like visiting a tool shed. You look for a wrench. If the wrench is there, you grab it. If not, you build one on the spot and hang it on the wall. The catch: you start building the wrench before checking whether it is already hanging on the wall. defaultdict is like a foreman who only sends someone to build when you report the tool is missing.

Core answer

Use dict.setdefault() for a local one-off "get or create" update. Use collections.defaultdict(factory) when the whole mapping has one stable missing-key policy.

# [CURRENT - 3.10-3.14] Works on Python 3.x
index = {}
pairs = [("python", 3), ("dict", 8), ("python", 14)]
for word, location in pairs:
index.setdefault(word, []).append(location)
print(index)
Mechanism and evaluation cost

setdefault(key, default) does two things in one dict method call:

  • if the key exists, return the existing value
  • if the key is missing, insert default and return it

The subtle part is that the default expression is still evaluated before the call, even if the key is already present.

# [CURRENT - 3.10-3.14] Works on Python 3.x
def make_default():
print("building default")
return []
data = {"x": [1]}
data.setdefault("x", make_default()).append(2)
print(data)

That prints building default even though "x" already exists. This is the main reason defaultdict is often better for repeated grouping: the factory is only called when a missing key is actually accessed.

# [CURRENT - 3.10-3.14] Works on Python 3.x
from collections import defaultdict
pairs = [("python", 3), ("dict", 8), ("python", 14)]
index = defaultdict(list)
for word, location in pairs:
index[word].append(location)
print(dict(index))
Why the naive pattern costs more

This common spelling spreads one idea across multiple dict operations:

# [CURRENT - 3.10-3.14] Works on Python 3.x
index = {}
pairs = [("python", 3), ("dict", 8), ("python", 14)]
for word, location in pairs:
if word not in index:
index[word] = []
index[word].append(location)

It is sometimes the right choice, especially when missing-key creation needs logging, validation, or a more complex branch. But when the real operation is simply "bucket, then mutate," setdefault or defaultdict states that intent more directly.

The older get-then-store pattern is especially noisy because it tends to perform separate logical read and write steps:

# [CURRENT - 3.10-3.14] Works on Python 3.x
index = {"python": [3]}
word = "python"
location = 14
occurrences = index.get(word, [])
occurrences.append(location)
index[word] = occurrences
Version context

dict.setdefault and collections.defaultdict are stable Python 3 APIs. Current project guidance targets Python 3.10-3.14. Python 3.9 and below are End-of-Life.

Dict insertion order is guaranteed in Python 3.7+, so grouped output retains first-seen key order in supported versions.

Edge cases and gotchas

Do not reuse one mutable object as the default across unrelated keys unless shared state is the goal.

# [CURRENT - 3.10-3.14] Works on Python 3.x
shared = []
data = {}
data.setdefault("a", shared).append(1)
data.setdefault("b", shared).append(2)
print(data)

Both keys now reference the same list object.

defaultdict has its own semantic cost: reading a missing key mutates the mapping by creating the default. That is excellent for accumulation, but it can be surprising at boundaries where "read" should not have write effects.

If missing-key creation is expensive or has side effects, avoid computed defaults inside setdefault. Use an explicit branch or a lazy factory via defaultdict.

Production usage

Use ──────────────────────────────────────────────

  • setdefault for small local transforms and parsers
  • defaultdict for accumulators, grouped indexes, and "dict of lists/sets/counters" workloads
  • explicit if key not in mapping when missing-key creation has extra business logic
# [CURRENT - 3.10-3.14] Works on Python 3.x
from collections import defaultdict
def build_index(rows):
index = defaultdict(list)
for word, location in rows:
index[word].append(location)
return dict(index)

That dict(index) conversion is useful at API boundaries because callers then receive an ordinary mapping with no implicit missing-key mutation policy.

Further depth
  • dict.setdefault
  • collections.defaultdict
  • Mapping types - dict
BOARD NOTESContext
WHY NO BENCHMARK?

This topic is better taught with structure, semantics, and cross-references than with a synthetic chart.

Auto-create only when the domain allows it.

RELATED GUIDES
NEXT CHECKS
Contribute