List vs set vs tuple vs array.array

Memory cost, lookup speed, mutation model, and dense numeric tradeoffs

Picking the wrong container costs memory, speed, or both. `list` is a dynamic array of `PyObject*` pointers with overallocation for amortized O(1) append. `tuple` is a fixed array with exact allocation and no per-element overhead beyond the container itself. `set` is a hash table with combined entries, roughly 216 bytes empty, scaling with the number of elements plus sparseness from the load factor. `array.array` stores raw C values inline with no per-element `PyObject` headers. For 1000 doubles, a list uses about 32,000 bytes (container + 1000 float objects) while an `array('d')` uses about 8,000 bytes. Membership is O(1) for dict and set, O(n) for list and tuple. This page compares all four with real measurements. <a href="/memory-tuples-lists">Deep dive on tuple vs list memory</a>. <a href="/memory-list-alternatives">Explore array.array, deque, and memoryview alternatives</a>. <a href="/dict-hash-tables">See how dict memory compares</a>.

Understand.
Visualize.
Master.

Python in Depth

An interactive engineering reference for Python internals

Quick note

Pick by workload, not habit.

:)
TABLE OF CONTENTS
4.1List vs set vs tuple vs array.array

Memory cost, lookup speed, mutation model, and dense numeric tradeoffs

You need to store a collection of values. list works. But tuple, set, and array.array also work — and each one makes different tradeoffs under the hood. Picking by surface syntax (square brackets vs. curly braces) means you get the memory and performance characteristics by accident, not by design.

Think of containers like vehicles. A list is a pickup truck — versatile, carries anything, can add more cargo. A tuple is a courier motorcycle — fixed load, lightweight, no room for extra boxes. A set is a bus — optimized for finding whether a passenger is on board, not for seating order. An array.array is a tanker truck — carries only one type of cargo but does it with maximum density.

Core answer

Use ──────────────────────────────────────────────

  • list for general mutable ordered collections
  • tuple for fixed-shape ordered records
  • set for repeated membership, uniqueness, and algebra
  • array.array for dense homogeneous numeric storage

If you pick only by surface syntax, you will miss the real costs:

  • list and tuple are containers of references to Python objects
  • set is a hash table
  • array.array is a packed buffer of C values

That difference determines memory footprint, mutation shape, and the operations that stay cheap at scale.

# [CURRENT - 3.10-3.14] Works on Python 3.x
from array import array
ordered = [10, 20, 30]
record = (10, 20, 30)
unique = {10, 20, 30}
packed = array("I", [10, 20, 30])
print(type(ordered), type(record), type(unique), type(packed))
Compare the Four Containers Side by Side

Switch workload and container focus to compare storage model, measured local proof, complexity shape, and the production fit of list, tuple, set, and array.array.

Decision matrix
ContainerOrderMutationDuplicatesMembership modelBest fit
listpreserves ordermutableallowedlinear scangeneral-purpose work buffer
tuplepreserves orderfixed-sizeallowedlinear scanfixed record / immutable return value
setnot sequence-stylemutableremovedaverage-case hash lookupmembership, dedupe, algebra
array.arraypreserves ordermutableallowedlinear scandense homogeneous numeric values

Two immediate rules are reliable:

  • if the dominant question is "x in container many times", start from set
  • if the dominant problem is "store many numbers with less overhead", start from array.array

Everything else is a tradeoff between ordered mutation, fixed shape, and API expectations.

Representation and memory model

list and tuple are both sequence containers of references. They do not store Python integers or floats inline. They store pointers to separate Python objects.

tuple is fixed-size. On current CPython, the tuple object contains its header plus its reference slots in one allocation.

list is resizable. On current CPython, the list object points to a separately managed reference array, and that array usually has spare capacity so append-heavy code does not reallocate on every push.

set is a hash table specialized for uniqueness and membership. It spends memory on spare buckets because fast lookup is its primary job.

array.array is different from all three. It stores raw C values packed side by side under a chosen typecode such as "I" or "d". That is why it can be dramatically smaller for dense numeric workloads.

# [CURRENT - 3.10-3.14] Works on Python 3.x
from array import array
floats = [1.0, 2.0, 3.0]
packed = array("d", floats)
print(floats)
print(packed)

The typecode matters because it selects the underlying C representation. For example, "d" uses C double, and "I" uses C unsigned int. The exact width comes from the platform C type, which is why typecode details are stdlib API contracts rather than universal byte-count guarantees.

CPython internals

Each container type has a distinct C-level representation in CPython 3.12:

list (Objects/listobject.c). The PyListObject contains:

  • ob_item — a pointer to a PyObject** array allocated separately from the list object
  • allocated — the number of slots reserved in the ob_item array, which may exceed ob_size (the logical length)
  • overallocation follows new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6) (from listobject.c, the list_resize function), which gives ~12.5% spare capacity plus a constant

tuple (Objects/tupleobject.c). The PyTupleObject stores its ob_item array inline within the object allocation. There is no separate array, no spare capacity, and the size is fixed at creation.

set (Objects/setobject.c). The PySetObject contains:

  • a hash table with fill (occupied + dummy) and used (active) counters
  • minimum size of PySet_MINSIZE = 8 slots
  • a fill-based resize trigger at ~60%, based on occupied-plus-dummy slots rather than only active entries, different from dict's USABLE_FRACTION

array.array (Modules/arraymodule.c). The array object stores:

  • a contiguous C array of values determined by the typecode
  • the buffer is part of the array object's allocation (for small arrays) or a separately allocated block (for large arrays)
  • no per-element Python object overhead

The key insight from the CPython C source is the difference in allocation patterns:

  • tuple does one allocation for the whole object; a 1000-element tuple is one contiguous block
  • list does two allocations (object + reference array); growth triggers reallocation of the reference array, copying all existing pointers
  • set allocates one combined hash table on resize; resize copies active entries into the new table, which is expensive but involves one allocation per resize
  • array.array does one or two allocations; growth reallocates the raw C buffer, copying the raw bytes

The list overallocation pattern is a compromise: too little overallocation wastes append performance; too much wastes memory. The exact formula is implementation-specific, but the tradeoff is a fundamental property of dynamic arrays.

Measured memory proof

The following local measurements were taken on CPython 3.12.3, 64-bit Linux using sys.getsizeof. They are useful for cost shape, not for byte-exact invariants across interpreters or builds.

Container-only footprint for representative sizes:

Elementslist(range(n))tuple(range(n))set(range(n))array("I", range(n))
056 B40 B216 B80 B
172 B48 B216 B96 B
10136 B120 B728 B144 B
100856 B840 B8408 B488 B
10008056 B8040 B32984 B4200 B

Three things matter here:

  1. tuple is only modestly smaller than list for the same number of references.
  2. set is much larger because hash-table spare capacity is part of the design.
  3. array.array is unusually compact because the payload is stored inline rather than as separate Python objects.
# [CURRENT - 3.10-3.14] Works on Python 3.x
# Example byte counts below were measured on CPython 3.12.3, 64-bit Linux.
import sys
from array import array
for n in (0, 1, 10, 100, 1000):
print(
n,
sys.getsizeof(list(range(n))),
sys.getsizeof(tuple(range(n))),
sys.getsizeof(set(range(n))),
sys.getsizeof(array("I", range(n))),
)

Now compare dense numeric storage instead of container-only size. For list and tuple, the total includes the container plus one separate Python float object per element. For array('d'), the sys.getsizeof result already includes the raw numeric payload.

WorkloadLocal size
list with 1000 floats, total32056 B
tuple with 1000 floats, total32040 B
array("d") with 1000 floats, total8080 B
# [CURRENT - 3.10-3.14] Works on Python 3.x
# Example byte counts below were measured on CPython 3.12.3, 64-bit Linux.
import sys
from array import array
n = 1000
list_total = sys.getsizeof([1.0] * n) + n * sys.getsizeof(1.0)
tuple_total = sys.getsizeof((1.0,) * n) + n * sys.getsizeof(1.0)
array_total = sys.getsizeof(array("d", [1.0] * n))
print(list_total)
print(tuple_total)
print(array_total)

This is the key memory lesson:

  • tuple beats list by a small container margin
  • array.array beats both by eliminating one Python object header per numeric element
  • set is usually the wrong answer when compact storage is the real goal
Performance model and measured proof

There is no single winner across operations because the containers are built for different jobs.

Membership over 10000 elements, measured locally with timeit over 10000 lookups:

OperationLocal time
9999 in list(range(10000))0.607 s
9999 in tuple(range(10000))0.528 s
9999 in set(range(10000))0.000235 s
9999 in array("I", range(10000))1.644 s

The stable lesson is the algorithmic model:

  • list, tuple, and array.array answer membership by scanning
  • set answers membership by hashing and probing
# [CURRENT - 3.10-3.14] Works on Python 3.x
import timeit
from array import array
setup = """
from array import array
lst = list(range(10000))
tpl = tuple(range(10000))
st = set(range(10000))
arr = array("I", range(10000))
"""
for stmt in ("9999 in lst", "9999 in tpl", "9999 in st", "9999 in arr"):
print(stmt, timeit.timeit(stmt, setup=setup, number=10000))

That surprising array.array result matters in production: packed storage does not imply hash-table lookup speed, and it does not automatically imply faster Python-level membership than a list.

Growth and insertion shape differ just as much:

  • list.append() is the normal builder pattern
  • tuple has no in-place append
  • set.add() grows a uniqueness index, not an ordered sequence
  • array.append() supports right-edge growth but only for values compatible with the chosen typecode

Local append/add timings over 100000 operations:

OperationLocal time
list.append(1)0.00126 s
set.add(1)0.00193 s
array("I").append(1)0.00240 s

For simple iteration, local sum(...) timings over 100000 elements and 300 runs were:

OperationLocal time
sum(list(range(100000)))0.157 s
sum(tuple(range(100000)))0.158 s
sum(set(range(100000)))0.214 s
sum(array("I", range(100000)))0.438 s

The operational lesson is conservative:

  • list and tuple are strong general Python-level traversal baselines
  • set iteration is secondary to membership and uniqueness
  • array.array wins on storage density, not automatically on every Python-level loop
Container-by-container guidance

list

  • choose it when order matters and mutation is normal
  • good for builders, buffers, rows, and API results callers may modify
  • bad when repeated membership checks dominate, or when homogeneous numeric footprint matters more than flexibility

tuple

  • choose it when the shape is fixed and should stay fixed
  • good for return values, record-like coordinates, and immutable cache keys
  • bad as a work buffer or any collection that needs to grow over time

set

  • choose it when uniqueness and membership are the point of the structure
  • good for blocklists, dedupe, schema checks, and set algebra
  • bad when you need duplicates, positional meaning, or compact storage

array.array

  • choose it when every element fits one numeric typecode and dense storage matters
  • good for large numeric buffers and binary file I/O with .tofile() / .fromfile()
  • bad when you need mixed types, hash-table membership semantics, or the broad API convenience of list
# [CURRENT - 3.10-3.14] Works on Python 3.x
from array import array
def load_sensor_window(readings):
return array("d", readings)
def normalize_tags(tags):
return set(tags)
def build_pipeline():
steps = ["parse", "validate"]
steps.append("store")
return steps
def parse_header():
return ("version", 3, 14)
Version context and guarantees

Current project guidance targets Python 3.10-3.14. Python 3.9 and below are End-of-Life.

Language-level guarantees:

  • list and tuple are ordered sequences
  • tuple is immutable as a container
  • set requires hashable elements and supports set algebra
  • array.array is a standard-library packed homogeneous container defined by typecode

CPython-specific details:

  • the exact byte counts shown above
  • list overallocation shape
  • the exact cost steps where sets resize
  • the specific local timings

Do not publish CPython measurements as if they were language guarantees. Use them to understand cost shape, then measure again on the deployment that matters.

Edge cases and production gotchas

tuple immutability is shallow. A tuple containing a list is still exposed to inner mutation and is not hashable.

# [CURRENT - 3.10-3.14] Works on Python 3.x
record = ("batch-1", [])
record[1].append("row-1")
print(record)
try:
hash(record)
except TypeError as exc:
print(exc)

set order is not a sorting contract. It may differ across runs and table states. If output order matters, sort explicitly.

array.array is homogeneous. The typecode is a real constraint, not a comment.

# [CURRENT - 3.10-3.14] Works on Python 3.x
from array import array
values = array("I", [1, 2, 3])
try:
values.append("x")
except TypeError as exc:
print(exc)

sys.getsizeof() is not recursive. If you forget that, you will underestimate the true cost of object containers and misunderstand why packed storage wins so clearly for numeric data.

A container that is compact for storage can still be the wrong operational structure. array.array is the clearest example: excellent for dense numeric memory, poor as a substitute for set-style membership or general-purpose list ergonomics.

Production usage

Use this decision rule in order:

  1. If uniqueness or repeated membership is the core job, start with set.
  2. If the values are homogeneous numbers and memory or binary I/O matters, start with array.array.
  3. If the shape is fixed and semantically record-like, start with tuple.
  4. Otherwise, start with list.

Then refine only when measurement or API semantics justify it.

For deeper internals and adjacent structures:

  • see for the list/tuple layout tradeoff
  • see for set behavior and dictionary views
  • see for array.array, deque, and memoryview
Further depth
  • Sequence types: list, tuple, range
  • Set types: set and frozenset
  • array module
  • sys.getsizeof
  • timeit
  • Data model: object.__hash__
  • CPython source: Objects/listobject.c
  • CPython source: Objects/tupleobject.c
  • CPython source: Objects/setobject.c
  • CPython source: Modules/arraymodule.c
  • Sequence types: list, tuple, range
MEASURED NOTEBOOKMeasured
Measured container tradeoffs

This notebook compares list, tuple, set, and array.array under three concrete workloads: shell memory, sum(container) traversal, and membership. It highlights which container wins for each job at 1k, 100k, and 1M elements.

Winnerarray.array — lowest @ 1,000,000
RELATED GUIDE
Container shell size by container and scale
0 KiB16384 KiB32768 KiB1k100k1M
list
tuple
set
array.array
CONTROLS
METRICS
Lowest shell costarray @ 1,000,000
Highest shell costset @ 1,000,000
array size3906.33 KiB
set size32768.21 KiB
NOTES

What this tests — `sys.getsizeof` of the container shell (not the elements inside) for list, tuple, set, and array.array. Smaller means less overhead before you add data.

Why array won — array is a flat sequence that stores raw C values inline with no per-element PyObject headers. List and tuple must store PyObject pointers, and set additionally reserves spare hash-table buckets.

The surprise — set is 8.4x larger than array at the shell level, but the real gap widens dramatically once you factor in the per-element PyObject overhead that array avoids entirely for numeric data.

Takeaway — for dense homogeneous numeric data, array.array is the most memory-efficient container. Set is the most expensive because it optimizes for membership speed, not compactness.

TEST ENVIRONMENT
Python Version3.12.3
Machinex86_64
Contribute