List vs set vs tuple vs array.array

You need to store a collection of values. list works. But tuple, set, and array.array also work — and each one makes different tradeoffs under the hood. Picking by surface syntax (square brackets vs. curly braces) means you get the memory and performance characteristics by accident, not by design.

Think of containers like vehicles. A list is a pickup truck — versatile, carries anything, can add more cargo. A tuple is a courier motorcycle — fixed load, lightweight, no room for extra boxes. A set is a bus — optimized for finding whether a passenger is on board, not for seating order. An array.array is a tanker truck — carries only one type of cargo but does it with maximum density.

Core answer

Use ──────────────────────────────────────────────

list for general mutable ordered collections
tuple for fixed-shape ordered records
set for repeated membership, uniqueness, and algebra
array.array for dense homogeneous numeric storage

If you pick only by surface syntax, you will miss the real costs:

list and tuple are containers of references to Python objects
set is a hash table
array.array is a packed buffer of C values

That difference determines memory footprint, mutation shape, and the operations that stay cheap at scale.

# [CURRENT - 3.10-3.14] Works on Python 3.x
from array import array
ordered = [10, 20, 30]
record = (10, 20, 30)
unique = {10, 20, 30}
packed = array("I", [10, 20, 30])
print(type(ordered), type(record), type(unique), type(packed))

Compare the Four Containers Side by Side

Switch workload and container focus to compare storage model, measured local proof, complexity shape, and the production fit of list, tuple, set, and array.array.

Decision matrix

Container	Order	Mutation	Duplicates	Membership model	Best fit
`list`	preserves order	mutable	allowed	linear scan	general-purpose work buffer
`tuple`	preserves order	fixed-size	allowed	linear scan	fixed record / immutable return value
`set`	not sequence-style	mutable	removed	average-case hash lookup	membership, dedupe, algebra
`array.array`	preserves order	mutable	allowed	linear scan	dense homogeneous numeric values

Two immediate rules are reliable:

if the dominant question is "x in container many times", start from set
if the dominant problem is "store many numbers with less overhead", start from array.array

Everything else is a tradeoff between ordered mutation, fixed shape, and API expectations.

Representation and memory model

list and tuple are both sequence containers of references. They do not store Python integers or floats inline. They store pointers to separate Python objects.

tuple is fixed-size. On current CPython, the tuple object contains its header plus its reference slots in one allocation.

list is resizable. On current CPython, the list object points to a separately managed reference array, and that array usually has spare capacity so append-heavy code does not reallocate on every push.

set is a hash table specialized for uniqueness and membership. It spends memory on spare buckets because fast lookup is its primary job.

array.array is different from all three. It stores raw C values packed side by side under a chosen typecode such as "I" or "d". That is why it can be dramatically smaller for dense numeric workloads.

# [CURRENT - 3.10-3.14] Works on Python 3.x
from array import array
floats = [1.0, 2.0, 3.0]
packed = array("d", floats)
print(floats)
print(packed)

The typecode matters because it selects the underlying C representation. For example, "d" uses C double, and "I" uses C unsigned int. The exact width comes from the platform C type, which is why typecode details are stdlib API contracts rather than universal byte-count guarantees.

CPython internals

Each container type has a distinct C-level representation in CPython 3.12:

list (Objects/listobject.c). The PyListObject contains:

ob_item — a pointer to a PyObject** array allocated separately from the list object
allocated — the number of slots reserved in the ob_item array, which may exceed ob_size (the logical length)
overallocation follows new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6) (from listobject.c, the list_resize function), which gives ~12.5% spare capacity plus a constant

tuple (Objects/tupleobject.c). The PyTupleObject stores its ob_item array inline within the object allocation. There is no separate array, no spare capacity, and the size is fixed at creation.

set (Objects/setobject.c). The PySetObject contains:

a hash table with fill (occupied + dummy) and used (active) counters
minimum size of PySet_MINSIZE = 8 slots
a fill-based resize trigger at ~60%, based on occupied-plus-dummy slots rather than only active entries, different from dict's USABLE_FRACTION

array.array (Modules/arraymodule.c). The array object stores:

a contiguous C array of values determined by the typecode
the buffer is part of the array object's allocation (for small arrays) or a separately allocated block (for large arrays)
no per-element Python object overhead

The key insight from the CPython C source is the difference in allocation patterns:

tuple does one allocation for the whole object; a 1000-element tuple is one contiguous block
list does two allocations (object + reference array); growth triggers reallocation of the reference array, copying all existing pointers
set allocates one combined hash table on resize; resize copies active entries into the new table, which is expensive but involves one allocation per resize
array.array does one or two allocations; growth reallocates the raw C buffer, copying the raw bytes

The list overallocation pattern is a compromise: too little overallocation wastes append performance; too much wastes memory. The exact formula is implementation-specific, but the tradeoff is a fundamental property of dynamic arrays.

Measured memory proof

The following local measurements were taken on CPython 3.12.3, 64-bit Linux using sys.getsizeof. They are useful for cost shape, not for byte-exact invariants across interpreters or builds.

Container-only footprint for representative sizes:

Elements	`list(range(n))`	`tuple(range(n))`	`set(range(n))`	`array("I", range(n))`
0	56 B	40 B	216 B	80 B
1	72 B	48 B	216 B	96 B
10	136 B	120 B	728 B	144 B
100	856 B	840 B	8408 B	488 B
1000	8056 B	8040 B	32984 B	4200 B

Three things matter here:

tuple is only modestly smaller than list for the same number of references.
set is much larger because hash-table spare capacity is part of the design.
array.array is unusually compact because the payload is stored inline rather than as separate Python objects.

# [CURRENT - 3.10-3.14] Works on Python 3.x
# Example byte counts below were measured on CPython 3.12.3, 64-bit Linux.
import sys
from array import array
for n in (0, 1, 10, 100, 1000):
    print(
        n,
        sys.getsizeof(list(range(n))),
        sys.getsizeof(tuple(range(n))),
        sys.getsizeof(set(range(n))),
        sys.getsizeof(array("I", range(n))),
    )

Now compare dense numeric storage instead of container-only size. For list and tuple, the total includes the container plus one separate Python float object per element. For array('d'), the sys.getsizeof result already includes the raw numeric payload.

Workload	Local size
`list` with 1000 floats, total	32056 B
`tuple` with 1000 floats, total	32040 B
`array("d")` with 1000 floats, total	8080 B

# [CURRENT - 3.10-3.14] Works on Python 3.x
# Example byte counts below were measured on CPython 3.12.3, 64-bit Linux.
import sys
from array import array
n = 1000
list_total = sys.getsizeof([1.0] * n) + n * sys.getsizeof(1.0)
tuple_total = sys.getsizeof((1.0,) * n) + n * sys.getsizeof(1.0)
array_total = sys.getsizeof(array("d", [1.0] * n))
print(list_total)
print(tuple_total)
print(array_total)

This is the key memory lesson:

tuple beats list by a small container margin
array.array beats both by eliminating one Python object header per numeric element
set is usually the wrong answer when compact storage is the real goal

Performance model and measured proof

There is no single winner across operations because the containers are built for different jobs.

Membership over 10000 elements, measured locally with timeit over 10000 lookups:

Operation	Local time
`9999 in list(range(10000))`	0.607 s
`9999 in tuple(range(10000))`	0.528 s
`9999 in set(range(10000))`	0.000235 s
`9999 in array("I", range(10000))`	1.644 s

The stable lesson is the algorithmic model:

list, tuple, and array.array answer membership by scanning
set answers membership by hashing and probing

# [CURRENT - 3.10-3.14] Works on Python 3.x
import timeit
from array import array
setup = """
from array import array
lst = list(range(10000))
tpl = tuple(range(10000))
st = set(range(10000))
arr = array("I", range(10000))
"""
for stmt in ("9999 in lst", "9999 in tpl", "9999 in st", "9999 in arr"):
    print(stmt, timeit.timeit(stmt, setup=setup, number=10000))

That surprising array.array result matters in production: packed storage does not imply hash-table lookup speed, and it does not automatically imply faster Python-level membership than a list.

Growth and insertion shape differ just as much:

list.append() is the normal builder pattern
tuple has no in-place append
set.add() grows a uniqueness index, not an ordered sequence
array.append() supports right-edge growth but only for values compatible with the chosen typecode

Local append/add timings over 100000 operations:

Operation	Local time
`list.append(1)`	0.00126 s
`set.add(1)`	0.00193 s
`array("I").append(1)`	0.00240 s

For simple iteration, local sum(...) timings over 100000 elements and 300 runs were:

Operation	Local time
`sum(list(range(100000)))`	0.157 s
`sum(tuple(range(100000)))`	0.158 s
`sum(set(range(100000)))`	0.214 s
`sum(array("I", range(100000)))`	0.438 s

The operational lesson is conservative:

list and tuple are strong general Python-level traversal baselines
set iteration is secondary to membership and uniqueness
array.array wins on storage density, not automatically on every Python-level loop

Container-by-container guidance

list

choose it when order matters and mutation is normal
good for builders, buffers, rows, and API results callers may modify
bad when repeated membership checks dominate, or when homogeneous numeric footprint matters more than flexibility

tuple

choose it when the shape is fixed and should stay fixed
good for return values, record-like coordinates, and immutable cache keys
bad as a work buffer or any collection that needs to grow over time

set

choose it when uniqueness and membership are the point of the structure
good for blocklists, dedupe, schema checks, and set algebra
bad when you need duplicates, positional meaning, or compact storage

array.array

choose it when every element fits one numeric typecode and dense storage matters
good for large numeric buffers and binary file I/O with .tofile() / .fromfile()
bad when you need mixed types, hash-table membership semantics, or the broad API convenience of list

# [CURRENT - 3.10-3.14] Works on Python 3.x
from array import array
def load_sensor_window(readings):
    return array("d", readings)
def normalize_tags(tags):
    return set(tags)
def build_pipeline():
    steps = ["parse", "validate"]
    steps.append("store")
    return steps
def parse_header():
    return ("version", 3, 14)

Version context and guarantees

Current project guidance targets Python 3.10-3.14. Python 3.9 and below are End-of-Life.

Language-level guarantees:

list and tuple are ordered sequences
tuple is immutable as a container
set requires hashable elements and supports set algebra
array.array is a standard-library packed homogeneous container defined by typecode

CPython-specific details:

the exact byte counts shown above
list overallocation shape
the exact cost steps where sets resize
the specific local timings

Do not publish CPython measurements as if they were language guarantees. Use them to understand cost shape, then measure again on the deployment that matters.

Edge cases and production gotchas

tuple immutability is shallow. A tuple containing a list is still exposed to inner mutation and is not hashable.

# [CURRENT - 3.10-3.14] Works on Python 3.x
record = ("batch-1", [])
record[1].append("row-1")
print(record)
try:
    hash(record)
except TypeError as exc:
    print(exc)

set order is not a sorting contract. It may differ across runs and table states. If output order matters, sort explicitly.

array.array is homogeneous. The typecode is a real constraint, not a comment.

# [CURRENT - 3.10-3.14] Works on Python 3.x
from array import array
values = array("I", [1, 2, 3])
try:
    values.append("x")
except TypeError as exc:
    print(exc)

sys.getsizeof() is not recursive. If you forget that, you will underestimate the true cost of object containers and misunderstand why packed storage wins so clearly for numeric data.

A container that is compact for storage can still be the wrong operational structure. array.array is the clearest example: excellent for dense numeric memory, poor as a substitute for set-style membership or general-purpose list ergonomics.

Production usage

Use this decision rule in order:

If uniqueness or repeated membership is the core job, start with set.
If the values are homogeneous numbers and memory or binary I/O matters, start with array.array.
If the shape is fixed and semantically record-like, start with tuple.
Otherwise, start with list.

Then refine only when measurement or API semantics justify it.

For deeper internals and adjacent structures:

see for the list/tuple layout tradeoff
see for set behavior and dictionary views
see for array.array, deque, and memoryview

Further depth

Container

Order

Mutation

Duplicates

Membership model

Best fit

list

preserves order

mutable

allowed

linear scan

general-purpose work buffer

tuple

preserves order

fixed-size

allowed

linear scan

fixed record / immutable return value

set

not sequence-style

mutable

removed

average-case hash lookup

membership, dedupe, algebra

array.array

preserves order

mutable

allowed

linear scan

dense homogeneous numeric values

Elements

list(range(n))

tuple(range(n))

set(range(n))

array("I", range(n))

56 B

40 B

216 B

80 B

72 B

48 B

216 B

96 B

136 B

120 B

728 B

144 B

100

856 B

840 B

8408 B

488 B

1000

8056 B

8040 B

32984 B

4200 B

Workload

Local size

list with 1000 floats, total

32056 B

tuple with 1000 floats, total

32040 B

array("d") with 1000 floats, total

8080 B

Operation

Local time

9999 in list(range(10000))

0.607 s

9999 in tuple(range(10000))

0.528 s

9999 in set(range(10000))

0.000235 s

9999 in array("I", range(10000))

1.644 s

Operation

Local time

list.append(1)

0.00126 s

set.add(1)

0.00193 s

array("I").append(1)

0.00240 s

Operation

Local time

sum(list(range(100000)))

0.157 s

sum(tuple(range(100000)))

0.158 s

sum(set(range(100000)))

0.214 s

sum(array("I", range(100000)))

0.438 s

Python in Depth

List vs set vs tuple vs array.array

Python in Depth