Grouping into mutable dictionary values is simple until every loop repeats the same branch and lookup. setdefault and defaultdict solve that shape with different API tradeoffs.
Core answer
Use setdefault for a local update that should materialize a bucket only on access. Use defaultdict when "missing means fresh bucket" is the mapping contract across a larger block of code.
# [CURRENT - 3.10-3.14] Works on Python 3.10+from dataclasses import dataclass@dataclass(frozen=True, slots=True)class Delivery: region: str order_id: strdef group_by_region(rows: list[Delivery]) -> dict[str, list[Delivery]]: grouped: dict[str, list[Delivery]] = {} for row in rows: grouped.setdefault(row.region, []).append(row) return groupeditems = [Delivery("BR", "ORD-1"), Delivery("BR", "ORD-2")]print(group_by_region(items))Why this design exists
Mutable aggregate values create a common mapping update pattern: find or create the bucket, then mutate it. setdefault exposes that operation on plain dictionaries. collections.defaultdict moves the factory into the mapping so repeated access sites share one missing-key rule.
The distinction matters in APIs. A plain dict with one intentional setdefault update is easier to serialize and reason about. A defaultdict can create keys on read-like indexing, which is convenient inside an aggregator and surprising at a boundary.
Mechanics and CPython internals
Both patterns still rely on dict hashing and probe behavior. setdefault(key, default) returns the existing value when present or stores and returns the supplied default when missing. The default expression is evaluated before the method sees whether it is needed, so expensive default construction may need a branch or a factory-bearing mapping.
# [CURRENT - 3.10-3.14] Works on Python 3.10+from collections import defaultdictfrom dataclasses import dataclass@dataclass(frozen=True, slots=True)class Retry: queue: str order_id: strdef build_retry_index(rows: list[Retry]) -> dict[str, list[str]]: grouped: defaultdict[str, list[str]] = defaultdict(list) for row in rows: grouped[row.queue].append(row.order_id) return dict(grouped)retries = [Retry("slow", "ORD-1"), Retry("slow", "ORD-2"), Retry("fast", "ORD-3")]print(build_retry_index(retries))Complexity and tradeoffs
The underlying append remains amortized O(1) and the dict lookup is average O(1). The tradeoff is not only probe count; it is semantics. A hand-written branch makes creation timing explicit. setdefault compresses the local pattern. defaultdict reduces repeated noise but changes missing-key behavior for [].
Idiomatic patterns and refactoring
Refactor repeated key checks when the bucket mutation is the real operation.
# [CURRENT - 3.10-3.14] Works on Python 3.10+from dataclasses import dataclass@dataclass(frozen=True, slots=True)class MetricPoint: name: str value: intdef collect_branch(points: list[MetricPoint]) -> dict[str, list[int]]: output: dict[str, list[int]] = {} for point in points: if point.name not in output: output[point.name] = [] output[point.name].append(point.value) return outputdef collect_setdefault(points: list[MetricPoint]) -> dict[str, list[int]]: output: dict[str, list[int]] = {} for point in points: output.setdefault(point.name, []).append(point.value) return outputprint(collect_branch([MetricPoint("latency", 9)]))print(collect_setdefault([MetricPoint("latency", 9)]))Common mistakes and edge cases
Do not reuse one mutable default object across multiple keys accidentally. Do not call setdefault(key, expensive()) and assume expensive() only runs for misses. Do not leak a defaultdict across a boundary where mere indexing should not mutate state.
When to use / When NOT to use
Use these patterns for bucketed aggregation, inverted indexes, adjacency lists, and grouping pipelines.
Do not use them when a missing key is an error, when default construction is expensive and rare, or when a counter is better modeled by collections.Counter.