Updating mutable values in dicts

The recurring dict pattern is: find the bucket for a key, create it if missing, then mutate that bucket. Python gives you setdefault and defaultdict for this, but they work differently — one evaluates its default eagerly and the other lazily.

Think of setdefault like visiting a tool shed. You look for a wrench. If the wrench is there, you grab it. If not, you build one on the spot and hang it on the wall. The catch: you start building the wrench before checking whether it is already hanging on the wall. defaultdict is like a foreman who only sends someone to build when you report the tool is missing.

Core answer

Use dict.setdefault() for a local one-off "get or create" update. Use collections.defaultdict(factory) when the whole mapping has one stable missing-key policy.

# [CURRENT - 3.10-3.14] Works on Python 3.x
index = {}
pairs = [("python", 3), ("dict", 8), ("python", 14)]
for word, location in pairs:
    index.setdefault(word, []).append(location)
print(index)

Mechanism and evaluation cost

setdefault(key, default) does two things in one dict method call:

if the key exists, return the existing value
if the key is missing, insert default and return it

The subtle part is that the default expression is still evaluated before the call, even if the key is already present.

# [CURRENT - 3.10-3.14] Works on Python 3.x
def make_default():
    print("building default")
    return []
data = {"x": [1]}
data.setdefault("x", make_default()).append(2)
print(data)

That prints building default even though "x" already exists. This is the main reason defaultdict is often better for repeated grouping: the factory is only called when a missing key is actually accessed.

# [CURRENT - 3.10-3.14] Works on Python 3.x
from collections import defaultdict
pairs = [("python", 3), ("dict", 8), ("python", 14)]
index = defaultdict(list)
for word, location in pairs:
    index[word].append(location)
print(dict(index))

Why the naive pattern costs more

This common spelling spreads one idea across multiple dict operations:

# [CURRENT - 3.10-3.14] Works on Python 3.x
index = {}
pairs = [("python", 3), ("dict", 8), ("python", 14)]
for word, location in pairs:
    if word not in index:
        index[word] = []
    index[word].append(location)

It is sometimes the right choice, especially when missing-key creation needs logging, validation, or a more complex branch. But when the real operation is simply "bucket, then mutate," setdefault or defaultdict states that intent more directly.

The older get-then-store pattern is especially noisy because it tends to perform separate logical read and write steps:

# [CURRENT - 3.10-3.14] Works on Python 3.x
index = {"python": [3]}
word = "python"
location = 14
occurrences = index.get(word, [])
occurrences.append(location)
index[word] = occurrences

Version context

dict.setdefault and collections.defaultdict are stable Python 3 APIs. Current project guidance targets Python 3.10-3.14. Python 3.9 and below are End-of-Life.

Dict insertion order is guaranteed in Python 3.7+, so grouped output retains first-seen key order in supported versions.

Edge cases and gotchas

Do not reuse one mutable object as the default across unrelated keys unless shared state is the goal.

# [CURRENT - 3.10-3.14] Works on Python 3.x
shared = []
data = {}
data.setdefault("a", shared).append(1)
data.setdefault("b", shared).append(2)
print(data)

Both keys now reference the same list object.

defaultdict has its own semantic cost: reading a missing key mutates the mapping by creating the default. That is excellent for accumulation, but it can be surprising at boundaries where "read" should not have write effects.

If missing-key creation is expensive or has side effects, avoid computed defaults inside setdefault. Use an explicit branch or a lazy factory via defaultdict.

Production usage

Use ──────────────────────────────────────────────

setdefault for small local transforms and parsers
defaultdict for accumulators, grouped indexes, and "dict of lists/sets/counters" workloads
explicit if key not in mapping when missing-key creation has extra business logic

# [CURRENT - 3.10-3.14] Works on Python 3.x
from collections import defaultdict
def build_index(rows):
    index = defaultdict(list)
    for word, location in rows:
        index[word].append(location)
    return dict(index)

That dict(index) conversion is useful at API boundaries because callers then receive an ordinary mapping with no implicit missing-key mutation policy.

Further depth

Core answer

Use dict.setdefault() for a local one-off "get or create" update. Use collections.defaultdict(factory) when the whole mapping has one stable missing-key policy.

# [CURRENT - 3.10-3.14] Works on Python 3.x
index = {}
pairs = [("python", 3), ("dict", 8), ("python", 14)]
for word, location in pairs:
    index.setdefault(word, []).append(location)
print(index)

Mechanism and evaluation cost

setdefault(key, default) does two things in one dict method call:

if the key exists, return the existing value
if the key is missing, insert default and return it

The subtle part is that the default expression is still evaluated before the call, even if the key is already present.

# [CURRENT - 3.10-3.14] Works on Python 3.x
def make_default():
    print("building default")
    return []
data = {"x": [1]}
data.setdefault("x", make_default()).append(2)
print(data)

# [CURRENT - 3.10-3.14] Works on Python 3.x
from collections import defaultdict
pairs = [("python", 3), ("dict", 8), ("python", 14)]
index = defaultdict(list)
for word, location in pairs:
    index[word].append(location)
print(dict(index))

Why the naive pattern costs more

This common spelling spreads one idea across multiple dict operations:

# [CURRENT - 3.10-3.14] Works on Python 3.x
index = {}
pairs = [("python", 3), ("dict", 8), ("python", 14)]
for word, location in pairs:
    if word not in index:
        index[word] = []
    index[word].append(location)

The older get-then-store pattern is especially noisy because it tends to perform separate logical read and write steps:

# [CURRENT - 3.10-3.14] Works on Python 3.x
index = {"python": [3]}
word = "python"
location = 14
occurrences = index.get(word, [])
occurrences.append(location)
index[word] = occurrences

Version context

dict.setdefault and collections.defaultdict are stable Python 3 APIs. Current project guidance targets Python 3.10-3.14. Python 3.9 and below are End-of-Life.

Dict insertion order is guaranteed in Python 3.7+, so grouped output retains first-seen key order in supported versions.

Edge cases and gotchas

Do not reuse one mutable object as the default across unrelated keys unless shared state is the goal.

# [CURRENT - 3.10-3.14] Works on Python 3.x
shared = []
data = {}
data.setdefault("a", shared).append(1)
data.setdefault("b", shared).append(2)
print(data)

Both keys now reference the same list object.

If missing-key creation is expensive or has side effects, avoid computed defaults inside setdefault. Use an explicit branch or a lazy factory via defaultdict.

Production usage

Use ──────────────────────────────────────────────

setdefault for small local transforms and parsers
defaultdict for accumulators, grouped indexes, and "dict of lists/sets/counters" workloads
explicit if key not in mapping when missing-key creation has extra business logic

# [CURRENT - 3.10-3.14] Works on Python 3.x
from collections import defaultdict
def build_index(rows):
    index = defaultdict(list)
    for word, location in rows:
        index[word].append(location)
    return dict(index)

That dict(index) conversion is useful at API boundaries because callers then receive an ordinary mapping with no implicit missing-key mutation policy.

Further depth

Python in Depth

Updating mutable values in dicts

Python in Depth