Sets and dictionary views

Fast membership, set algebra, frozenset, and zero-copy key comparisons

Set membership is O(1) average case. List membership is O(n). That difference compounds fast as the collection grows. CPython implements sets as a combined hash table using `_Py_setentry` structs, each storing a key pointer and its cached hash. Unlike dicts with their split-table layout, sets store key and hash together. Probing uses a hybrid strategy: nine steps of linear probing first, then the same perturb algorithm as dicts. Resize triggers at approximately 60% fill, higher than dicts because sets handle collisions differently in the combined layout. Dict views (`dict.keys()`, `dict.items()`) support set operations like `&`, `|`, `-`, `^` directly without materializing a standalone set. `frozenset` provides an immutable, hashable set for use as dict keys. <a href="/dict-hash-tables">Compare set layout with dict compact hash tables</a>. <a href="/memory-container-comparison">See how set memory scales versus list and tuple</a>.

Understand.
Visualize.
Master.

Python in Depth

An interactive engineering reference for Python internals

Quick note

Set algebra for overlap questions.

:)
TABLE OF CONTENTS
3.3Sets and dictionary views

Fast membership, set algebra, frozenset, and zero-copy key comparisons

You check whether a username is in a blocked list with username in blocked_list. That works — but if blocked_list is a list, Python scans every element one by one. If it is a set, Python computes the hash and checks in one step. The difference between linear scan and hash lookup grows with every element.

Think of a set like a club bouncer with a perfect memory. You say a name, and the bouncer instantly knows whether it is on the list. A list is like the same bouncer reading every name on a clipboard one by one. Dictionary views extend the same hash-powered lookup to the keys and items of a dict — no need to build a separate set first.

Core answer

Use a set when the real operation is repeated membership testing or set algebra: union, intersection, difference, subset, or superset.

# [CURRENT - 3.10-3.14] Works on Python 3.x
blocked = {"root", "admin", "system"}
def allowed(username):
return username not in blocked
print(allowed("ana"))

On this machine, local timeit measurements over a 10000-element collection showed the scale of the difference:

  • 9999 in list(range(10000)) repeated 10000 times: about 0.595 seconds
  • 9999 in set(range(10000)) repeated 10000 times: about 0.000236 seconds

Those exact times are not portable. The stable lesson is the algorithmic one: list membership is linear scan, set membership is average-case hash lookup.

Step Through Set Membership

Follow how a set membership test uses a hash table probe path instead of scanning every element one by one.

Mechanism and memory cost

Sets are hash tables without associated values. The same hashability rules that govern dict keys govern set elements.

# [CURRENT - 3.10-3.14] Works on Python 3.x
role = frozenset({"read", "write"})
permissions = {role: "editor"}
print(permissions[role])

Because sets are hash tables, they also reserve significant extra memory to protect lookup speed. Local CPython 3.12.3 measurements with sys.getsizeof show stepped growth:

  • empty set: 216 bytes
  • 1 to 4 elements: 216 bytes
  • 5 to 16 elements: 728 bytes
  • 32 to 64 elements: 2264 bytes
# [CURRENT - 3.10-3.14] Works on Python 3.x
# Example byte counts below were measured on CPython 3.12.3, 64-bit Linux.
import sys
for n in (0, 1, 4, 5, 16, 32):
s = set(range(n))
print(n, sys.getsizeof(s))

That overhead is the price of near-constant-time membership.

The set base object (~216 bytes for an empty set on CPython 3.12) is much larger than an empty dict (~72 bytes) because the set object includes inline hash table storage (8 entry slots) even at minimum size.

CPython internals

CPython implements sets using the same underlying hash table machinery as dicts. The set type (PySetObject) is defined in Objects/setobject.c. Unlike dicts, sets store only keys — there are no associated values. The hash table uses the same probing algorithm but a different resize policy (~60% fill trigger instead of dict's 2/3).

The set object contains:

  • a pointer to the hash table entries (an array of _Py_setentry structs, each holding a key pointer and cached hash)
  • used and fill counters that track how many slots are occupied (used) versus ever occupied (fill, which includes dummy entries from deleted elements)

The probing formula for sets uses a hybrid of linear probing and the same perturb algorithm as dicts (Objects/setobject.c):

# 9 steps of linear probing first, then fall back to perturb
j = (5 * j) + 1 + perturb
perturb >>= PERTURB_SHIFT

The minimum set size on CPython 3.12 is PySet_MINSIZE = 8 slots, with a load factor that triggers resize at ~60% fill. The trigger is based on fill (active entries plus dummy slots left by deletions), not just used active entries.

A key difference from dict: sets use a combined entry struct (_Py_setentry) that stores key and hash together, rather than the split-table layout dicts use (sparse indices + dense entries). Set resizing allocates one new combined table, copies active entries, and frees the old table — no rehashing needed because the hash is cached in each entry.

When you call set() on an iterable, CPython calls set_update_iterable in Objects/setobject.c (Python 3.12), which iterates the source and inserts each element one at a time. The frozenset constructor does the same but returns an immutable wrapper around the same hash table, with no add/discard methods.

frozenset is hashable, which means it can be used as a dict key or set element. This is its primary practical advantage over mutable set — without hashability, you could not nest sets inside sets or use them as mapping keys.

Set algebra and API edges

Sets support the expected algebra directly:

# [CURRENT - 3.10-3.14] Works on Python 3.x
requested = {"read", "write", "delete"}
granted = {"read", "write"}
print(requested - granted)
print(requested <= granted)
print(requested & granted)

Operator forms such as |, &, and - on actual set objects expect set operands. If the right-hand side is a general iterable, use methods such as .union() or .intersection().

# [CURRENT - 3.10-3.14] Works on Python 3.x
print({"a", "b"}.intersection(["b", "c"]))
Dictionary views

dict.keys(), dict.items(), and dict.values() return dynamic view objects over the underlying dictionary. They are not materialized lists, and they update when the dictionary changes.

# [CURRENT - 3.10-3.14] Works on Python 3.x
data = {"a": 1}
keys = data.keys()
data["b"] = 2
print(list(keys))

Key views are set-like because keys are unique and hashable. Item views are also set-like when their (key, value) pairs are hashable. Values views are not set-like because values are not generally unique.

# [CURRENT - 3.10-3.14] Works on Python 3.x
expected = {"id", "email", "role"}
payload = {"id": 1, "email": "a@example.com", "debug": True}
missing = expected - payload.keys()
extra = payload.keys() - expected
print(missing, extra)

One subtle but important distinction from ordinary sets: the Python docs note that set-like dictionary views accept any iterable as the other operand for set operators. That makes view algebra surprisingly ergonomic.

# [CURRENT - 3.10-3.14] Works on Python 3.x
d = {"a": 1, "b": 2}
print(d.keys() | ["b", "c", "c"])

This works because the view's __or__ method builds a set from self and then calls .union() with the other operand, which accepts any iterable. The view itself is not a set — it is a live projection of the dict's current keys — but the set operators on views produce real set objects as results.

Version context

Set types and dictionary views are stable Python 3 features. Dictionary insertion order is guaranteed in Python 3.7+. Dictionary views became reversible in Python 3.8. The .mapping attribute on views was added in Python 3.10. Python 3.9 and below are End-of-Life.

CPython-specific notes:

  • the empty-set memory footprint (216 bytes on CPython 3.12) is an implementation detail; other interpreters may differ
  • the probing algorithm and load factor may change across CPython releases
  • source for set internals: Objects/setobject.c
Edge cases and gotchas

Set iteration order is not a sorting contract. It can vary across runs and table states. If output order matters, sort explicitly.

# [CURRENT - 3.10-3.14] Works on Python 3.x
names = {"zoe", "ana", "bo"}
print(sorted(names))

frozenset is immutable and hashable, but that does not make it ordered. It is for stable membership sets, not for sequence semantics.

If you only need key comparisons, use dictionary views directly. Building a separate set of keys first often adds allocation without adding meaning.

Production usage

Use sets for:

  • blocklists and allowlists
  • schema comparison
  • permission algebra
  • deduplication when order does not matter

Use dict.keys() and dict.items() views for live schema and key-delta checks against dictionaries that may continue to evolve during processing.

When each key needs associated data, move back to dictionaries; see .

Further depth
  • Set types: set and frozenset
  • Dictionary view objects
  • Data model: object.__hash__
  • CPython source: Objects/setobject.c
  • PEP 412: Key-Sharing Dictionary
  • sys.getsizeof
  • set and frozenset documentation
MEASURED NOTEBOOKMeasured
Measured set membership versus list scan

This notebook compares set membership with list membership on the same values. It shows which path is faster for hit and miss probes and what each representation costs in container memory at 10k, 100k, and 1M values.

Winnerset membership — 64724x faster than list @ 1M
RELATED GUIDE
Hit membership time by collection size
0.00 µs1664.5 µs3329.1 µs10k100k1M
set membership
list membership
CONTROLS
METRICS
Faster membership pathset membership
Speed gap @ 1M hit64724x faster than list @ 1M
Memory cost @ 1Mset 32768.2 KiB / list 7812.6 KiB
Largest collection tested1,000,000 values
NOTES

What this tests — `value in set` vs `value in list` on the same 1M elements. The set pays a memory premium for hash-table sparsity; the list pays a time penalty for linear scanning. The question is which tradeoff wins for membership workloads.

Why set won — set is implemented as an open-addressed hash table (same core algorithm as dict). Membership means hashing the key and probing a few slots — O(1) average. The list must compare `__eq__` sequentially — O(n).

The surprise — the gap exceeds 64,724x at 1M. The set's memory overhead (~32,768 KiB vs ~7,813 KiB for the list) is recouped after a tiny number of lookups.

Takeaway — if the dominant question is 'is this value present?', start with set. The memory overhead of the hash table is dwarfed by the time savings for repeated membership checks.

TEST ENVIRONMENT
Python Version3.12.3
Machinex86_64
Contribute