The GIL, or global interpreter lock, is one of the most important runtime constraints in CPython. It affects how threads behave, why some code scales across cores and some does not, and why you have heard "Python threading is slow" without ever getting the full story.
Think of the GIL like a single-lane bridge. Only one car (thread) crosses the bridge (executes Python bytecode) at a time. If a car stops to enjoy the view (waits on I/O), it pulls onto a rest area and releases the bridge for other cars. That is why I/O-bound threads work well but CPU-bound threads queue up. The bridge is not the problem — the bottleneck is what you are doing while on it.
In a regular CPython build, the GIL allows only one thread at a time to execute Python bytecode in a given interpreter.
That has three immediate consequences:
- pure-Python CPU-bound threads usually do not scale across cores the way people first expect
- I/O-bound threads can still help a lot because the GIL is released while waiting on I/O
- true multi-core Python execution usually requires another model:
- processes
- multiple interpreters with separate GILs
- native code that releases the GIL
- or a free-threaded CPython build
# [CURRENT - 3.10-3.14] Works on Python 3.xfrom concurrent.futures import ThreadPoolExecutordef cpu_task(n): total = 0 for i in range(n): total += i return totalThe official Python glossary describes the GIL as the mechanism used by CPython to ensure that only one thread executes Python bytecode at a time. The historical tradeoff is explicit:
- CPython's object model and runtime become much simpler to implement correctly
- critical built-in structures are protected from many forms of concurrent corruption
- but Python bytecode execution loses much of the parallelism available on multi-core machines
This matters because CPython objects are deeply shared runtime structures:
- reference counts change constantly
- containers such as
dict,list, andsethave mutable internal state - many operations can allocate, deallocate, resize, and invoke arbitrary Python code
The GIL centralizes a large part of that safety story in the regular build.
Without it, CPython needs more fine-grained thread-safety machinery around:
- reference counting
- object memory management
- container access
- specialization caches
That is exactly why free-threaded CPython is a major interpreter project rather than a tiny switch.
In CPython 3.12+, the GIL implementation lives across the evaluation loop, thread-state code, and Python/ceval_gil.c. In the standard CPython build (Py_GIL_DISABLED not defined), a thread must hold the GIL while executing Python bytecode, and the runtime releases it around blocking operations.
Acquisition model. CPython uses a periodic-check approach. The GIL is not released after every bytecode instruction. Instead, the eval loop and GIL scheduler cooperate through internal eval-breaker state. The public tuning knob is sys.setswitchinterval(), which sets the ideal duration of a thread's timeslice (default: 5 milliseconds on CPython 3.12). The exact handoff timing is an implementation detail and can also be affected by blocking calls and operating-system scheduling.
Release points. The GIL is explicitly released in these situations (for example in CPython's eval, GIL, and thread support code):
- before any blocking I/O call (file read/write, socket operations,
time.sleep()) - during computationally intensive native code in some extension modules (e.g., hashing, compression, regex matching)
- when
Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADSmacros are used in C extensions
This is why the right mental model is:
- threads are still concurrent
- but Python bytecode execution in one interpreter is serialized by the GIL in the regular build
The switch interval:
# [CURRENT - 3.10-3.14] Works on Python 3.ximport sysprint(sys.getswitchinterval())The switch interval is a runtime tuning knob for how often Python threads are given a chance to switch. Actual scheduling depends on the operating system and on whether the current thread reaches a point where switching can happen — it is a best-effort interval, not a deterministic guarantee.
Per-interpreter GIL (PEP 684, Python 3.12+). Py_NewInterpreter() has existed since Python 1.5, but before 3.12 all sub-interpreters shared the same GIL. Starting in Python 3.12, each sub-interpreter created by Py_NewInterpreter() gets its own GIL. This is implemented in Python/pystate.c — the GIL is stored per-interpreter in the PyInterpreterState struct, so threads running in different interpreters do not contend for the same lock. InterpreterPoolExecutor (Python 3.14) provides a high-level API for this.
Distinguish "concurrency" (dealing with many things at once) from "parallelism" (doing many things at once). The GIL makes this distinction concrete: threads provide concurrency for I/O-bound work but not parallelism for CPU-bound Python bytecode. Use processes for CPU-bound parallelism in the regular CPython build.
The GIL protects CPython runtime internals. What it guarantees:
- individual bytecode-level operations are internally consistent
- the interpreter itself avoids corruption from concurrent access
- most
Py_DECREFcalls and allocation paths are safe without additional locking
What it does not replace:
x += 1spans multiple bytecode steps and is not atomic- compound operations on shared data still need explicit synchronization
- your threaded application code still needs locks for shared mutable state
Python only guarantees atomicity where it is explicitly documented. For everything else, use threading.Lock.
The GIL protects the interpreter from corruption. It does not protect your application's shared-state logic. Use explicit synchronization for shared mutable data.
The main performance effect is simple:
- CPU-bound pure-Python threads compete for one interpreter lock
- I/O-bound threads often overlap effectively because the waiting thread releases the GIL
Representative local CPython 3.12.3 measurements on this machine:
| Workload | Shape | Elapsed time | Main reason |
|---|---|---|---|
| CPU loop twice | serial | 1.088 s | No thread coordination cost; just one thread running Python bytecode at a time anyway |
| CPU loop twice | 2 threads | 1.057 s | Threads time-slice behind one GIL, so there is little or no multi-core gain |
| CPU loop twice | 2 processes | 0.591 s | Separate runtimes can execute on separate cores |
sleep(0.25) twice | serial | 0.500 s | Waiting happens one after another |
sleep(0.25) twice | 2 threads | 0.251 s | Waiting overlaps because blocked threads release the GIL |
These are local measurements, not language guarantees. The point is the shape:
- CPU-bound threads: usually no real Python-bytecode parallelism
- waiting-heavy threads: often very effective
The glossary states two important facts:
- the GIL is always released during I/O
- some extension modules release it during computationally intensive native work such as compression or hashing
That means the real question is not just "am I using threads?" The real question is:
- where is the time spent?
- in Python bytecode?
- in blocking waits?
- in native code that releases the GIL?
This is where becomes useful. If the hot path is dominated by Python bytecode, dis can help explain the execution shape. If the hot path is mostly C-level work or I/O, bytecode may matter much less than the native runtime boundary.
If the workload is CPU-bound Python code, your main alternatives are:
ProcessPoolExecutorormultiprocessing- multiple interpreters with separate GILs
- native extensions or vectorized libraries that release the GIL
- free-threaded CPython builds
Python 3.14 adds InterpreterPoolExecutor, which runs tasks in separate interpreters. The docs explicitly describe its main benefit: each interpreter has its own GIL, so code in one interpreter can run on one CPU core while code in another interpreter runs unblocked on a different core.
That is an important distinction:
- threads in one interpreter share one GIL
- separate interpreters can each have their own GIL
The tradeoff is stronger isolation and more deliberate data movement.
As of Python 3.13, CPython supports a free-threaded build based on PEP 703. The standard docs describe it as a separate configuration where the GIL is disabled.
Important caveats:
- this is not the default build
- compatibility and performance tradeoffs still matter
- some extensions may re-enable the GIL at runtime or may not support the free-threaded build yet
The free-threaded story is therefore not:
- "Python finally has no GIL everywhere"
The real story is:
- CPython now has an evolving opt-in build configuration where the GIL can be disabled
- making that safe requires substantial internal runtime changes, including per-object locking, biased reference counting, and quiescent-state-based reclamation (QSBR) in
Objects/object.c - deployment, extension compatibility, and single-thread costs still matter
Current project guidance targets Python 3.10-3.14. Python 3.9 and below are End-of-Life.
Important version markers:
- regular CPython in the supported range still has the classic GIL behavior by default
- PEP 684 introduced the per-interpreter GIL groundwork in CPython 3.12
InterpreterPoolExecutoris new in Python 3.14- PEP 703 introduced free-threaded CPython support starting in Python 3.13
When teaching or optimizing, always say which world you are talking about:
- default CPython build
- per-interpreter parallelism
- free-threaded build
Those are not the same runtime story.
The GIL does not make async obsolete, and async does not remove the GIL. They solve different problems.
- the GIL constrains threaded Python bytecode execution
asynciois cooperative concurrency in one thread unless you explicitly offload work
See and .
Another trap is assuming that "thread-safe built-ins" means business-logic safety. A dict not corrupting itself is not the same as your threaded update protocol being correct.
Finally, do not generalize from one benchmark blindly. Threads can still win when:
- the program waits on I/O
- the work happens in native code that releases the GIL
- the dominant cost is not Python bytecode execution
Use this rule set:
- choose threads for I/O-bound work and shared-memory coordination
- choose processes for CPU-bound Python code when you need reliable multi-core scaling
- consider multiple interpreters when interpreter isolation is acceptable and you want true parallelism without full process separation
- evaluate free-threaded CPython deliberately, as a deployment/runtime choice rather than a default assumption
When performance matters:
- classify the workload as CPU-bound Python, native-code-heavy, or waiting-heavy
- measure the actual bottleneck
- choose the concurrency model that matches the bottleneck
- inspect bytecode only when interpreter overhead is plausibly relevant
- Glossary: global interpreter lock
- threading — Thread-based parallelism
- sys.getswitchinterval
- concurrent.futures
- Python support for free threading
- PEP 684: A Per-Interpreter GIL
- PEP 703: Making the Global Interpreter Lock Optional in CPython
- CPython source: Python/ceval.c
- CPython source: Python/ceval_gil.c
- CPython source: Python/pystate.c
- Python glossary: GIL
- concurrent.futures docs