Updating mutable values in dicts

Use setdefault or defaultdict to avoid repeated lookups and branch noise

The `if key not in d: d[key] = []` pattern performs two dict lookups. `setdefault` does it in one. But the default argument to `setdefault` is evaluated eagerly, before checking whether the key exists. For expensive defaults, that means wasted allocation on every existing-key access. `collections.defaultdict` solves this with lazy evaluation through `__missing__`, called only when the key is absent. However, `defaultdict` has its own subtlety: reading a missing key mutates the mapping. This matters when iterating or checking membership. The `get()`-then-assign pattern is an anti-pattern: it still performs two lookups. `defaultdict(list)` is usually the cleanest choice for grouping patterns. <a href="/dict-hash-tables">Learn how dict lookups work under the hood</a>. <a href="/memory-container-comparison">Compare dict memory costs with other containers</a>.

Understand.
Visualize.
Master.

Python in Depth

An interactive engineering reference for Python internals

Quick note

Auto-create only when the domain allows it.

:)
Python version

Targets Python 3.10–3.14. Python 3.9 and below are End-of-Life.

TABLE OF CONTENTS
3.2Updating mutable values in dicts

Use setdefault or defaultdict to avoid repeated lookups and branch noise

Grouping into mutable dictionary values is simple until every loop repeats the same branch and lookup. setdefault and defaultdict solve that shape with different API tradeoffs.

Core answer

Use setdefault for a local update that should materialize a bucket only on access. Use defaultdict when "missing means fresh bucket" is the mapping contract across a larger block of code.

# [CURRENT - 3.10-3.14] Works on Python 3.10+
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class Delivery:
region: str
order_id: str
def group_by_region(rows: list[Delivery]) -> dict[str, list[Delivery]]:
grouped: dict[str, list[Delivery]] = {}
for row in rows:
grouped.setdefault(row.region, []).append(row)
return grouped
items = [Delivery("BR", "ORD-1"), Delivery("BR", "ORD-2")]
print(group_by_region(items))

Why this design exists

Mutable aggregate values create a common mapping update pattern: find or create the bucket, then mutate it. setdefault exposes that operation on plain dictionaries. collections.defaultdict moves the factory into the mapping so repeated access sites share one missing-key rule.

The distinction matters in APIs. A plain dict with one intentional setdefault update is easier to serialize and reason about. A defaultdict can create keys on read-like indexing, which is convenient inside an aggregator and surprising at a boundary.

Mechanics and CPython internals

Both patterns still rely on dict hashing and probe behavior. setdefault(key, default) returns the existing value when present or stores and returns the supplied default when missing. The default expression is evaluated before the method sees whether it is needed, so expensive default construction may need a branch or a factory-bearing mapping.

# [CURRENT - 3.10-3.14] Works on Python 3.10+
from collections import defaultdict
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class Retry:
queue: str
order_id: str
def build_retry_index(rows: list[Retry]) -> dict[str, list[str]]:
grouped: defaultdict[str, list[str]] = defaultdict(list)
for row in rows:
grouped[row.queue].append(row.order_id)
return dict(grouped)
retries = [Retry("slow", "ORD-1"), Retry("slow", "ORD-2"), Retry("fast", "ORD-3")]
print(build_retry_index(retries))

Complexity and tradeoffs

The underlying append remains amortized O(1) and the dict lookup is average O(1). The tradeoff is not only probe count; it is semantics. A hand-written branch makes creation timing explicit. setdefault compresses the local pattern. defaultdict reduces repeated noise but changes missing-key behavior for [].

Idiomatic patterns and refactoring

Refactor repeated key checks when the bucket mutation is the real operation.

# [CURRENT - 3.10-3.14] Works on Python 3.10+
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class MetricPoint:
name: str
value: int
def collect_branch(points: list[MetricPoint]) -> dict[str, list[int]]:
output: dict[str, list[int]] = {}
for point in points:
if point.name not in output:
output[point.name] = []
output[point.name].append(point.value)
return output
def collect_setdefault(points: list[MetricPoint]) -> dict[str, list[int]]:
output: dict[str, list[int]] = {}
for point in points:
output.setdefault(point.name, []).append(point.value)
return output
print(collect_branch([MetricPoint("latency", 9)]))
print(collect_setdefault([MetricPoint("latency", 9)]))

Common mistakes and edge cases

Do not reuse one mutable default object across multiple keys accidentally. Do not call setdefault(key, expensive()) and assume expensive() only runs for misses. Do not leak a defaultdict across a boundary where mere indexing should not mutate state.

When to use / When NOT to use

Use these patterns for bucketed aggregation, inverted indexes, adjacency lists, and grouping pipelines.

Do not use them when a missing key is an error, when default construction is expensive and rare, or when a counter is better modeled by collections.Counter.

Further reading

  • Official docs: dict.setdefault
  • Official docs: collections.defaultdict
  • Official docs: collections.Counter
  • CPython source: dict implementation
BOARD NOTESContext
WHY NO BENCHMARK?

This topic is better taught with structure, semantics, and cross-references than with a synthetic chart.

Auto-create only when the domain allows it.

RELATED GUIDES
NEXT CHECKS
Contribute