List vs set vs tuple vs array.array

Memory cost, lookup speed, mutation model, and dense numeric tradeoffs

Picking the wrong container costs memory, speed, or both. `list` is a dynamic array of `PyObject*` pointers with overallocation for amortized O(1) append. `tuple` is a fixed array with exact allocation and no per-element overhead beyond the container itself. `set` is a hash table with combined entries, roughly 216 bytes empty, scaling with the number of elements plus sparseness from the load factor. `array.array` stores raw C values inline with no per-element `PyObject` headers. For 1000 doubles, a list uses about 32,000 bytes (container + 1000 float objects) while an `array('d')` uses about 8,000 bytes. Membership is O(1) for dict and set, O(n) for list and tuple. This page compares all four with real measurements. <a href="/memory-tuples-lists">Deep dive on tuple vs list memory</a>. <a href="/memory-list-alternatives">Explore array.array, deque, and memoryview alternatives</a>. <a href="/dict-hash-tables">See how dict memory compares</a>.

Understand.
Visualize.
Master.

Python in Depth

An interactive engineering reference for Python internals

Quick note

Pick by workload, not habit.

:)
Python version

Targets Python 3.10–3.14. Python 3.9 and below are End-of-Life.

TABLE OF CONTENTS
4.1List vs set vs tuple vs array.array

Memory cost, lookup speed, mutation model, and dense numeric tradeoffs

Container choice is a memory-layout choice before it is a syntax choice. list, tuple, set, and array.array spend bytes on different promises.

Core answer

Start with semantics: list for mutable ordered references, tuple for fixed ordered references, set for hash-based membership, and array.array for homogeneous C-style numeric storage.

# [CURRENT - 3.10-3.14] Works on Python 3.10+
from array import array
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class Samples:
ordered: list[int]
fixed: tuple[int, ...]
unique: set[int]
packed: array
def load(values: list[int]) -> Samples:
return Samples(values, tuple(values), set(values), array("I", values))
print(load([7, 7, 9]))

Why this design exists

Python containers optimize for different contracts. A general list must hold arbitrary Python objects and grow. A tuple can commit to a fixed length. A set reserves spare hash slots for fast membership. array.array drops heterogeneous object storage to pack primitive values.

Mechanics and CPython internals

Lists and tuples store references to PyObject values. The objects themselves may live elsewhere, carry object headers, and participate in reference counting. Lists over-allocate to keep append amortized; tuples do not grow. Sets store hash-table metadata and spare slots. array.array stores raw C values in its buffer and converts to Python objects when values cross the API boundary.

# [CURRENT - 3.10-3.14] Works on Python 3.10+
from array import array
from dataclasses import dataclass
from sys import getsizeof
@dataclass(frozen=True, slots=True)
class Footprint:
name: str
bytes: int
def compare(count: int) -> list[Footprint]:
values = list(range(count))
return [
Footprint("list", getsizeof(values)),
Footprint("tuple", getsizeof(tuple(values))),
Footprint("set", getsizeof(set(values))),
Footprint("array", getsizeof(array("I", values))),
]
print(compare(1000))

Complexity and tradeoffs

List append is amortized O(1), indexed access for list and tuple is O(1), set membership is average O(1), and list/tuple membership is scan-based O(n). array.array improves density but does not automatically beat every Python-level numeric loop because conversion and algorithm shape still matter.

Idiomatic patterns and refactoring

Refactor when the dominant operation disagrees with the representation.

# [CURRENT - 3.10-3.14] Works on Python 3.10+
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class Blocklist:
raw_ids: list[str]
lookup_ids: set[str]
def build(ids: list[str]) -> Blocklist:
return Blocklist(ids, set(ids))
def blocked_slow(ids: list[str], user_id: str) -> bool:
return user_id in ids
def blocked(index: Blocklist, user_id: str) -> bool:
return user_id in index.lookup_ids
index = build(["U-1", "U-2", "U-3"])
print(blocked_slow(index.raw_ids, "U-3"))
print(blocked(index, "U-3"))

Common mistakes and edge cases

A tuple containing mutable objects is only shallowly immutable. A set loses duplicates. An array enforces its typecode and platform-sized primitive representation. Size readings from sys.getsizeof are local CPython measurements, not a portable whole-object-graph bill.

When to use / When NOT to use

Use the smallest contract that matches the workload and measure when memory or throughput matters.

Do not choose a dense or hashed container only because a microbenchmark ignored the domain operations your code actually performs.

Further reading

  • Official docs: sequence types
  • Official docs: set types
  • Official docs: array
  • CPython source: list implementation
  • CPython source: tuple implementation
MEASURED NOTEBOOKMeasured
Measured container tradeoffs

This notebook compares list, tuple, set, and array.array under three concrete workloads: shell memory, sum(container) traversal, and membership. It highlights which container wins for each job at 1k, 100k, and 1M elements.

Winnerarray.array — lowest @ 1,000,000
RELATED GUIDE
Container shell size by container and scale
0 KiB16384 KiB32768 KiB1k100k1M
list
tuple
set
array.array
CONTROLS
METRICS
Lowest shell costarray @ 1,000,000
Highest shell costset @ 1,000,000
array size3906.33 KiB
set size32768.21 KiB
NOTES

What this tests — `sys.getsizeof` of the container shell (not the elements inside) for list, tuple, set, and array.array. Smaller means less overhead before you add data.

Why array won — array is a flat sequence that stores raw C values inline with no per-element PyObject headers. List and tuple must store PyObject pointers, and set additionally reserves spare hash-table buckets.

The surprise — set is 8.4x larger than array at the shell level, but the real gap widens dramatically once you factor in the per-element PyObject overhead that array avoids entirely for numeric data.

Takeaway — for dense homogeneous numeric data, array.array is the most memory-efficient container. Set is the most expensive because it optimizes for membership speed, not compactness.

TEST ENVIRONMENT
Python Version3.12.3
Machinex86_64
Contribute