Python bytecode is the instruction stream CPython executes after compiling source into a code object. It is like reading the assembly output of a C compiler — you see the exact steps the interpreter takes. Bytecode explains execution shape: which names are loaded as locals versus globals, whether a comprehension gets its own opcode, how Python dispatches a match statement. But it is not a substitute for measurement, algorithm analysis, or knowledge of C-level built-ins.
Use the dis module when you need to answer questions like:
- which names are loaded as locals, globals, or closure cells?
- is this loop doing repeated attribute lookup and Python-level calls?
- is there a dedicated opcode such as
LIST_APPENDorMATCH_SEQUENCE? - did CPython compile this into one straightforward instruction path or several dispatch steps?
# [CURRENT - 3.10-3.14] Works on Python 3.ximport disdef total(xs): return [x * 2 for x in xs]dis.dis(total)Keep the boundary clear:
- bytecode is CPython interpreter work, not machine code
- fewer or simpler bytecode steps can reduce overhead
- but real performance depends on data structure choice, object allocation, C-level built-ins, specialization, and algorithmic complexity
On CPython, source code is compiled into a code object. That code object contains:
- constants
- local-variable metadata
- names
- free-variable metadata
- an instruction stream that CPython's bytecode interpreter executes
That is why functions expose attributes such as:
__code__.co_consts__code__.co_varnames__code__.co_names__code__.co_freevars
# [CURRENT - 3.10-3.14] Works on Python 3.xdef make_adder(base): def add(x): return base + x return addfn = make_adder(10)print(fn.__code__.co_varnames)print(fn.__code__.co_freevars)print(fn.__code__.co_consts)This is practical information: it tells you what the interpreter thinks your function depends on:
- fast locals
- global names
- captured outer-scope names
- embedded constants
That helps explain why nearby-looking code can execute through meaningfully different paths.
The simplified CPython pipeline is:
- parse source into an abstract syntax tree
- compile that tree into a code object
- execute the code object's bytecode in the evaluation loop
In current CPython, the instruction set is version-sensitive and implementation-specific. The language guarantee is Python behavior; the exact opcode names are not portable promises.
For example, locals and closure cells are not loaded the same way:
# [CURRENT - 3.10-3.14] Works on Python 3.ximport disdef outer(): base = 10 def inner(x): return base + x return innerfn = outer()dis.dis(fn)On current CPython, that closure read shows up as LOAD_DEREF, while a plain local read shows up as LOAD_FAST. The difference is purely semantic — the interpreter follows a different path because the name lives in a closure cell, not the fast-locals array.
See also for the scope side of the same mechanism.
The evaluation loop (Python/ceval.c). CPython's bytecode interpreter runs a generated dispatch loop inside _PyEval_EvalFrameDefault. The exact dispatch machinery is a CPython implementation detail, but conceptually the loop:
- reads the next instruction from the internal instruction pointer (
next_instrin_PyInterpreterFrame) - decodes the opcode and its argument
- dispatches to the matching instruction handler
- advances the instruction pointer and repeats
Inline caching and adaptive specialization (CPython 3.11+, PEP 659). Starting in Python 3.11, CPython added a specializing adaptive interpreter. Opcodes like LOAD_ATTR, LOAD_GLOBAL, BINARY_OP, and CALL are "adaptive" — on their first execution, they behave as generic versions. After enough executions (warm-up), CPython replaces them with specialized versions (e.g., LOAD_ATTR becomes LOAD_ATTR_SLOT for slot-based attribute access). This is why CPython 3.11+ code can show different bytecode on the first run vs. after warm-up.
You can inspect specialization caches with dis.dis(func, show_caches=True, adaptive=True):
# [CURRENT - 3.11-3.14] Requires Python 3.11+ [PEP 659]import disdef total(xs): return sum(xs)dis.dis(total, show_caches=True, adaptive=True)The cache entries show up as inline metadata between opcodes. They are not real opcodes — they are data slots that the specializing interpreter uses to track type observations and specialized replacement targets.
Opcode encoding. In CPython 3.12, each instruction is 2 bytes: 1 byte for the opcode, 1 byte for the argument (or a 2-byte extended argument if EXTENDED_ARG precedes it). The opcode range is 0–255, with HAVE_ARGUMENT (90) as the boundary between opcodes that take no argument and those that do.
The frame execution model. In CPython 3.11+, the interpreter executes calls with an internal _PyInterpreterFrame. That internal frame stores the code object's local slots, closure cells, and evaluation stack in a compact "locals plus" layout. A Python-level frame object (PyFrameObject, exposed as types.FrameType) is the public view used by debuggers, trace hooks, and introspection APIs.
The public frame object exposes attributes and C API accessors such as:
f_locals— a mapping view of local variable bindingsf_globals— a reference to the module's global dictf_builtins— a reference to the builtins dictf_lasti— the last-executed instruction offset (used for tracing, exception handling, and resume)
The internal _PyInterpreterFrame and its stack layout are CPython implementation details, not stable public API.
This is why local variable access is fast: LOAD_FAST is a direct C array access by index, while LOAD_GLOBAL requires a dict lookup and LOAD_DEREF requires following a chain of closure cell objects.
The dis module is the bridge between high-level syntax and the interpreter's execution model. Use it diagnostically — to answer a specific question about execution shape — rather than as a routine optimization tool.
The two most useful entry points are:
dis.dis(...)for human-readable disassemblydis.get_instructions(...)for structured instruction objects
# [CURRENT - 3.10-3.14] Works on Python 3.ximport disdef scale(x, y): return x * ydis.dis(scale)for ins in dis.get_instructions(scale): print(ins.opname, ins.argrepr)dis.get_instructions is often the better choice if you want to:
- inspect opcodes programmatically
- filter specific instruction kinds
- build your own reports or teaching tools
For plain debugging or education, dis.dis is usually enough.
On current CPython, dis.dis also exposes version-sensitive options for cache/specialization visibility:
# [CURRENT - 3.11-3.14] Requires Python 3.11+ [PEP 659]import disdef total(xs): return sum(xs)dis.dis(total)dis.dis(total, show_caches=True, adaptive=True)Those extra views are useful when you want to inspect modern CPython's specializing interpreter behavior. They are not portable language-level contracts.
Bytecode becomes especially informative when two snippets are semantically similar but one requires more interpreter work.
List comprehension vs manual append loop is the standard example:
# [CURRENT - 3.10-3.14] Works on Python 3.ximport disdef manual(rows): out = [] for row in rows: out.append(row * 2) return outdef comp(rows): return [row * 2 for row in rows]dis.dis(manual)dis.dis(comp)On current CPython 3.12, the manual loop performs repeated:
LOAD_ATTRforappendCALLto invoke the method
while the comprehension uses a dedicated LIST_APPEND path inside its compiled loop.
That helps explain why simple comprehensions often benchmark better. The key point is not "comprehensions are magic." The key point is:
- fewer Python-level dispatch steps
- less repeated lookup/call overhead
- better interpreter-level execution shape for that narrow case
The same style of reasoning helps with:
- local vs global name access (
LOAD_FASTvsLOAD_GLOBAL) - closure access (
LOAD_DEREF) - dedicated matching opcodes in structural pattern matching
See ──────────────────────────────────────────────
This is the production trap. Bytecode can explain overhead, but it does not fully explain runtime cost.
Reasons ──────────────────────────────────────────
- the expensive part may be C-level work inside a builtin or extension
- allocation and object creation may dominate
- hash-table behavior may dominate
- branch predictability and cache locality may dominate
- algorithmic complexity may dwarf opcode overhead
- adaptive specialization in CPython 3.11+ can change the effective execution path after warm-up
# [CURRENT - 3.10-3.14] Works on Python 3.ximport timeitprint(timeit.timeit("[x * 2 for x in range(1000)]", number=10000))print(timeit.timeit("""out = []for x in range(1000): out.append(x * 2)""", number=10000))The right workflow is:
- identify a hot path
- benchmark or profile it
- inspect bytecode if the question is interpreter overhead or execution shape
- change the code only if the result still makes sense at the API/readability level
"This has fewer opcodes" is not a complete performance argument. If the hot cost is inside object creation, hashing, I/O, or a C extension, bytecode counts can be almost irrelevant.
Current project guidance targets Python 3.10-3.14. Python 3.9 and below are End-of-Life.
Important version facts:
dishas existed for a long time, so basic disassembly examples are stable Python 3 material- CPython 3.11 introduced the specializing adaptive interpreter PEP 659
- exact opcode names, cache layout, jump shapes, and disassembly formatting changed materially in 3.11+
- code that inspects bytecode text output should be treated as version-sensitive tooling
This means two things:
- teaching at the level of
LOAD_FAST,LOAD_GLOBAL,LOAD_DEREF,LIST_APPEND, andCALLis still useful - copying exact disassembly screenshots across versions is risky
When you compare bytecode, compare it on the Python version you actually deploy.
Do not confuse bytecode with:
- AST structure
- machine code
- JIT output from another implementation
Also do not present CPython opcode behavior as a Python language guarantee. PyPy, MicroPython, and other implementations do not have to expose or optimize the same instruction stream the same way.
Another trap: dis output is observational, not normative. It tells you what this interpreter version emitted, not what future versions must emit.
# [CURRENT - 3.10-3.14] Works on Python 3.ximport disdef classify(x): return x + 1print(classify.__code__.co_varnames)print(classify.__code__.co_names)dis.dis(classify)That output is great for diagnosis. It is not a contract you should build hard production logic around unless you own the version lock and the maintenance cost.
Use bytecode inspection when:
- a hot path seems dominated by Python-level overhead
- you need to explain a benchmark result
- you want to understand closure/global/local behavior
- you are teaching or debugging interpreter-level execution shape
Do not use bytecode inspection as a reflex for every optimization question. Reach for it when the question is specifically about:
- dispatch overhead
- repeated method lookup
- call boundaries
- specialized/dedicated interpreter paths
Good production practice:
- write the clearest correct code first
- measure
- inspect bytecode if the measurement suggests interpreter overhead matters
- keep the optimized version only if the gain is real and the code stays defensible