You need to store a collection of values. list works. But tuple, set, and array.array also work — and each one makes different tradeoffs under the hood. Picking by surface syntax (square brackets vs. curly braces) means you get the memory and performance characteristics by accident, not by design.
Think of containers like vehicles. A list is a pickup truck — versatile, carries anything, can add more cargo. A tuple is a courier motorcycle — fixed load, lightweight, no room for extra boxes. A set is a bus — optimized for finding whether a passenger is on board, not for seating order. An array.array is a tanker truck — carries only one type of cargo but does it with maximum density.
Use ──────────────────────────────────────────────
listfor general mutable ordered collectionstuplefor fixed-shape ordered recordssetfor repeated membership, uniqueness, and algebraarray.arrayfor dense homogeneous numeric storage
If you pick only by surface syntax, you will miss the real costs:
listandtupleare containers of references to Python objectssetis a hash tablearray.arrayis a packed buffer of C values
That difference determines memory footprint, mutation shape, and the operations that stay cheap at scale.
# [CURRENT - 3.10-3.14] Works on Python 3.xfrom array import arrayordered = [10, 20, 30]record = (10, 20, 30)unique = {10, 20, 30}packed = array("I", [10, 20, 30])print(type(ordered), type(record), type(unique), type(packed))| Container | Order | Mutation | Duplicates | Membership model | Best fit |
|---|---|---|---|---|---|
list | preserves order | mutable | allowed | linear scan | general-purpose work buffer |
tuple | preserves order | fixed-size | allowed | linear scan | fixed record / immutable return value |
set | not sequence-style | mutable | removed | average-case hash lookup | membership, dedupe, algebra |
array.array | preserves order | mutable | allowed | linear scan | dense homogeneous numeric values |
Two immediate rules are reliable:
- if the dominant question is "
x in containermany times", start fromset - if the dominant problem is "store many numbers with less overhead", start from
array.array
Everything else is a tradeoff between ordered mutation, fixed shape, and API expectations.
list and tuple are both sequence containers of references. They do not store Python integers or floats inline. They store pointers to separate Python objects.
tuple is fixed-size. On current CPython, the tuple object contains its header plus its reference slots in one allocation.
list is resizable. On current CPython, the list object points to a separately managed reference array, and that array usually has spare capacity so append-heavy code does not reallocate on every push.
set is a hash table specialized for uniqueness and membership. It spends memory on spare buckets because fast lookup is its primary job.
array.array is different from all three. It stores raw C values packed side by side under a chosen typecode such as "I" or "d". That is why it can be dramatically smaller for dense numeric workloads.
# [CURRENT - 3.10-3.14] Works on Python 3.xfrom array import arrayfloats = [1.0, 2.0, 3.0]packed = array("d", floats)print(floats)print(packed)The typecode matters because it selects the underlying C representation. For example, "d" uses C double, and "I" uses C unsigned int. The exact width comes from the platform C type, which is why typecode details are stdlib API contracts rather than universal byte-count guarantees.
Each container type has a distinct C-level representation in CPython 3.12:
list (Objects/listobject.c). The PyListObject contains:
ob_item— a pointer to aPyObject**array allocated separately from the list objectallocated— the number of slots reserved in theob_itemarray, which may exceedob_size(the logical length)- overallocation follows
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6)(fromlistobject.c, thelist_resizefunction), which gives ~12.5% spare capacity plus a constant
tuple (Objects/tupleobject.c). The PyTupleObject stores its ob_item array inline within the object allocation. There is no separate array, no spare capacity, and the size is fixed at creation.
set (Objects/setobject.c). The PySetObject contains:
- a hash table with
fill(occupied + dummy) andused(active) counters - minimum size of
PySet_MINSIZE = 8slots - a fill-based resize trigger at ~60%, based on occupied-plus-dummy slots rather than only active entries, different from dict's USABLE_FRACTION
array.array (Modules/arraymodule.c). The array object stores:
- a contiguous C array of values determined by the typecode
- the buffer is part of the array object's allocation (for small arrays) or a separately allocated block (for large arrays)
- no per-element Python object overhead
The key insight from the CPython C source is the difference in allocation patterns:
tupledoes one allocation for the whole object; a 1000-element tuple is one contiguous blocklistdoes two allocations (object + reference array); growth triggers reallocation of the reference array, copying all existing pointerssetallocates one combined hash table on resize; resize copies active entries into the new table, which is expensive but involves one allocation per resizearray.arraydoes one or two allocations; growth reallocates the raw C buffer, copying the raw bytes
The list overallocation pattern is a compromise: too little overallocation wastes append performance; too much wastes memory. The exact formula is implementation-specific, but the tradeoff is a fundamental property of dynamic arrays.
The following local measurements were taken on CPython 3.12.3, 64-bit Linux using sys.getsizeof. They are useful for cost shape, not for byte-exact invariants across interpreters or builds.
Container-only footprint for representative sizes:
| Elements | list(range(n)) | tuple(range(n)) | set(range(n)) | array("I", range(n)) |
|---|---|---|---|---|
| 0 | 56 B | 40 B | 216 B | 80 B |
| 1 | 72 B | 48 B | 216 B | 96 B |
| 10 | 136 B | 120 B | 728 B | 144 B |
| 100 | 856 B | 840 B | 8408 B | 488 B |
| 1000 | 8056 B | 8040 B | 32984 B | 4200 B |
Three things matter here:
tupleis only modestly smaller thanlistfor the same number of references.setis much larger because hash-table spare capacity is part of the design.array.arrayis unusually compact because the payload is stored inline rather than as separate Python objects.
# [CURRENT - 3.10-3.14] Works on Python 3.x# Example byte counts below were measured on CPython 3.12.3, 64-bit Linux.import sysfrom array import arrayfor n in (0, 1, 10, 100, 1000): print( n, sys.getsizeof(list(range(n))), sys.getsizeof(tuple(range(n))), sys.getsizeof(set(range(n))), sys.getsizeof(array("I", range(n))), )Now compare dense numeric storage instead of container-only size. For list and tuple, the total includes the container plus one separate Python float object per element. For array('d'), the sys.getsizeof result already includes the raw numeric payload.
| Workload | Local size |
|---|---|
list with 1000 floats, total | 32056 B |
tuple with 1000 floats, total | 32040 B |
array("d") with 1000 floats, total | 8080 B |
# [CURRENT - 3.10-3.14] Works on Python 3.x# Example byte counts below were measured on CPython 3.12.3, 64-bit Linux.import sysfrom array import arrayn = 1000list_total = sys.getsizeof([1.0] * n) + n * sys.getsizeof(1.0)tuple_total = sys.getsizeof((1.0,) * n) + n * sys.getsizeof(1.0)array_total = sys.getsizeof(array("d", [1.0] * n))print(list_total)print(tuple_total)print(array_total)This is the key memory lesson:
tuplebeatslistby a small container marginarray.arraybeats both by eliminating one Python object header per numeric elementsetis usually the wrong answer when compact storage is the real goal
There is no single winner across operations because the containers are built for different jobs.
Membership over 10000 elements, measured locally with timeit over 10000 lookups:
| Operation | Local time |
|---|---|
9999 in list(range(10000)) | 0.607 s |
9999 in tuple(range(10000)) | 0.528 s |
9999 in set(range(10000)) | 0.000235 s |
9999 in array("I", range(10000)) | 1.644 s |
The stable lesson is the algorithmic model:
list,tuple, andarray.arrayanswer membership by scanningsetanswers membership by hashing and probing
# [CURRENT - 3.10-3.14] Works on Python 3.ximport timeitfrom array import arraysetup = """from array import arraylst = list(range(10000))tpl = tuple(range(10000))st = set(range(10000))arr = array("I", range(10000))"""for stmt in ("9999 in lst", "9999 in tpl", "9999 in st", "9999 in arr"): print(stmt, timeit.timeit(stmt, setup=setup, number=10000))That surprising array.array result matters in production: packed storage does not imply hash-table lookup speed, and it does not automatically imply faster Python-level membership than a list.
Growth and insertion shape differ just as much:
list.append()is the normal builder patterntuplehas no in-place appendset.add()grows a uniqueness index, not an ordered sequencearray.append()supports right-edge growth but only for values compatible with the chosen typecode
Local append/add timings over 100000 operations:
| Operation | Local time |
|---|---|
list.append(1) | 0.00126 s |
set.add(1) | 0.00193 s |
array("I").append(1) | 0.00240 s |
For simple iteration, local sum(...) timings over 100000 elements and 300 runs were:
| Operation | Local time |
|---|---|
sum(list(range(100000))) | 0.157 s |
sum(tuple(range(100000))) | 0.158 s |
sum(set(range(100000))) | 0.214 s |
sum(array("I", range(100000))) | 0.438 s |
The operational lesson is conservative:
listandtupleare strong general Python-level traversal baselinessetiteration is secondary to membership and uniquenessarray.arraywins on storage density, not automatically on every Python-level loop
list
- choose it when order matters and mutation is normal
- good for builders, buffers, rows, and API results callers may modify
- bad when repeated membership checks dominate, or when homogeneous numeric footprint matters more than flexibility
tuple
- choose it when the shape is fixed and should stay fixed
- good for return values, record-like coordinates, and immutable cache keys
- bad as a work buffer or any collection that needs to grow over time
set
- choose it when uniqueness and membership are the point of the structure
- good for blocklists, dedupe, schema checks, and set algebra
- bad when you need duplicates, positional meaning, or compact storage
array.array
- choose it when every element fits one numeric typecode and dense storage matters
- good for large numeric buffers and binary file I/O with
.tofile()/.fromfile() - bad when you need mixed types, hash-table membership semantics, or the broad API convenience of
list
# [CURRENT - 3.10-3.14] Works on Python 3.xfrom array import arraydef load_sensor_window(readings): return array("d", readings)def normalize_tags(tags): return set(tags)def build_pipeline(): steps = ["parse", "validate"] steps.append("store") return stepsdef parse_header(): return ("version", 3, 14)Current project guidance targets Python 3.10-3.14. Python 3.9 and below are End-of-Life.
Language-level guarantees:
listandtupleare ordered sequencestupleis immutable as a containersetrequires hashable elements and supports set algebraarray.arrayis a standard-library packed homogeneous container defined by typecode
CPython-specific details:
- the exact byte counts shown above
- list overallocation shape
- the exact cost steps where sets resize
- the specific local timings
Do not publish CPython measurements as if they were language guarantees. Use them to understand cost shape, then measure again on the deployment that matters.
tuple immutability is shallow. A tuple containing a list is still exposed to inner mutation and is not hashable.
# [CURRENT - 3.10-3.14] Works on Python 3.xrecord = ("batch-1", [])record[1].append("row-1")print(record)try: hash(record)except TypeError as exc: print(exc)set order is not a sorting contract. It may differ across runs and table states. If output order matters, sort explicitly.
array.array is homogeneous. The typecode is a real constraint, not a comment.
# [CURRENT - 3.10-3.14] Works on Python 3.xfrom array import arrayvalues = array("I", [1, 2, 3])try: values.append("x")except TypeError as exc: print(exc)sys.getsizeof() is not recursive. If you forget that, you will underestimate the true cost of object containers and misunderstand why packed storage wins so clearly for numeric data.
A container that is compact for storage can still be the wrong operational structure. array.array is the clearest example: excellent for dense numeric memory, poor as a substitute for set-style membership or general-purpose list ergonomics.
Use this decision rule in order:
- If uniqueness or repeated membership is the core job, start with
set. - If the values are homogeneous numbers and memory or binary I/O matters, start with
array.array. - If the shape is fixed and semantically record-like, start with
tuple. - Otherwise, start with
list.
Then refine only when measurement or API semantics justify it.
For deeper internals and adjacent structures:
- see for the list/tuple layout tradeoff
- see for set behavior and dictionary views
- see for
array.array,deque, andmemoryview
- Sequence types: list, tuple, range
- Set types: set and frozenset
- array module
- sys.getsizeof
- timeit
- Data model: object.__hash__
- CPython source: Objects/listobject.c
- CPython source: Objects/tupleobject.c
- CPython source: Objects/setobject.c
- CPython source: Modules/arraymodule.c
- Sequence types: list, tuple, range