gh-119786: move interpreter doc from devguide to InternalDocs (#125715)

2024-11-21 12:59:38 +01:00 · 2024-10-21 18:54:24 +01:00 · 2024-10-21 18:54:24 +01:00 · 695814c6e9
commit 695814c6e9
parent 9b0bfba2a2
4 changed files with 400 additions and 9 deletions
--- a/InternalDocs/README.md
+++ b/InternalDocs/README.md
@ -11,19 +11,32 @@ The core dev team attempts to keep this documentation up to date. If
 it is not, please report that through the
 [issue tracker](https://github.com/python/cpython/issues).

-Index:
-----

-[Guide to the parser](parser.md)
+Compiling Python Source Code
+---

-[Compiler Design](compiler.md)
+- [Guide to the parser](parser.md)

-[Frames](frames.md)
+- [Compiler Design](compiler.md)

-[Adaptive Instruction Families](adaptive.md)
+Runtime Objects
+---

-[The Source Code Locations Table](locations.md)
+- [Code Objects (coming soon)](code_objects.md)

-[Garbage collector design](garbage_collector.md)
+- [The Source Code Locations Table](locations.md)

-[Exception Handling](exception_handling.md)
+- [Generators (coming soon)](generators.md)
+
+- [Frames](frames.md)
+
+Program Execution
+---
+
+- [The Interpreter](interpreter.md)
+
+- [Adaptive Instruction Families](adaptive.md)
+
+- [Garbage Collector Design](garbage_collector.md)
+
+- [Exception Handling](exception_handling.md)
--- a/InternalDocs/code_objects.md
+++ b/InternalDocs/code_objects.md
@ -0,0 +1,5 @@
+
+Code objects
+============ 
+
+Coming soon.
--- a/InternalDocs/generators.md
+++ b/InternalDocs/generators.md
@ -0,0 +1,9 @@
+
+Generators
+========== 
+
+Coming soon.
+
+<!--
+- Generators, async functions, async generators, and ``yield from`` (next, send, throw, close; and await; and how this code breaks the interpreter abstraction)
+-->
--- a/InternalDocs/interpreter.md
+++ b/InternalDocs/interpreter.md
@ -0,0 +1,364 @@
+
+The bytecode interpreter
+========================
+
+Overview
+--------
+
+This document describes the workings and implementation of the bytecode
+interpreter, the part of python that executes compiled Python code. Its
+entry point is in [Python/ceval.c](../Python/ceval.c).
+
+At a high level, the interpreter consists of a loop that iterates over the
+bytecode instructions, executing each of them via a switch statement that
+has a case implementing each opcode. This switch statement is generated
+from the instruction definitions in [Python/bytecodes.c](../Python/bytecodes.c)
+which are written in [a DSL](../Tools/cases_generator/interpreter_definition.md)
+developed for this purpose.
+
+Recall that the [Python Compiler](compiler.md) produces a [`CodeObject`](code_object.md),
+which contains the bytecode instructions along with static data that is required to execute them,
+such as the consts list, variable names,
+[exception table](exception_handling.md#format-of-the-exception-table), and so on.
+
+When the interpreter's
+[`PyEval_EvalCode()`](https://docs.python.org/3.14/c-api/veryhigh.html#c.PyEval_EvalCode)
+function is called to execute a `CodeObject`, it constructs a [`Frame`](frames.md) and calls
+[`_PyEval_EvalFrame()`](https://docs.python.org/3.14/c-api/veryhigh.html#c.PyEval_EvalCode)
+to execute the code object in this frame. The frame hold the dynamic state of the
+`CodeObject`'s execution, including the instruction pointer, the globals and builtins.
+It also has a reference to the `CodeObject` itself.
+
+In addition to the frame, `_PyEval_EvalFrame()` also receives a
+[`Thread State`](https://docs.python.org/3/c-api/init.html#c.PyThreadState)
+object, `tstate`, which includes things like the exception state and the
+recursion depth.  The thread state also provides access to the per-interpreter
+state (`tstate->interp`), which has a pointer to the per-runtime (that is,
+truly global) state (`tstate->interp->runtime`).
+
+Finally, `_PyEval_EvalFrame()` receives an integer argument `throwflag`
+which, when nonzero, indicates that the interpreter should just raise the current exception
+(this is used in the implementation of
+[`gen.throw`](https://docs.python.org/3.14/reference/expressions.html#generator.throw).
+
+By default, [`_PyEval_EvalFrame()`](https://docs.python.org/3.14/c-api/veryhigh.html#c.PyEval_EvalCode)
+simply calls [`_PyEval_EvalFrameDefault()`] to execute the frame. However, as per
+[`PEP 523`](https://peps.python.org/pep-0523/) this is configurable by setting
+`interp->eval_frame`. In the following, we describe the default function,
+`_PyEval_EvalFrameDefault()`.
+
+
+Instruction decoding
+--------------------
+
+The first task of the interpreter is to decode the bytecode instructions.
+Bytecode is stored as an array of 16-bit code units (`_Py_CODEUNIT`).
+Each code unit contains an 8-bit `opcode` and an 8-bit argument (`oparg`), both unsigned.
+In order to make the bytecode format independent of the machine byte order when stored on disk,
+`opcode` is always the first byte and `oparg` is always the second byte.
+Macros are used to extract the `opcode` and `oparg` from a code unit
+(`_Py_OPCODE(word)` and `_Py_OPARG(word)`).
+Some instructions (for example, `NOP` or `POP_TOP`) have no argument -- in this case
+we ignore `oparg`.
+
+A simplified version of the interpreter's main loop looks like this:
+
+```c
+    _Py_CODEUNIT *first_instr = code->co_code_adaptive;
+    _Py_CODEUNIT *next_instr = first_instr;
+    while (1) {
+        _Py_CODEUNIT word = *next_instr++;
+        unsigned char opcode = _Py_OPCODE(word);
+        unsigned int oparg = _Py_OPARG(word);
+        switch (opcode) {
+        // ... A case for each opcode ...
+        }
+    }
+```
+
+This loop iterates over the instructions, decoding each into its `opcode`
+and `oparg`, and then executes the switch case that implements this `opcode`.
+
+The instruction format supports 256 different opcodes, which is sufficient.
+However, it also limits `oparg` to 8-bit values, which is too restrictive.
+To overcome this, the `EXTENDED_ARG` opcode allows us to prefix any instruction
+with one or more additional data bytes, which combine into a larger oparg.
+For example, this sequence of code units:
+
+    EXTENDED_ARG  1
+    EXTENDED_ARG  0
+    LOAD_CONST    2
+
+would set `opcode` to `LOAD_CONST` and `oparg` to `65538` (that is, `0x1_00_02`).
+The compiler should limit itself to at most three `EXTENDED_ARG` prefixes, to allow the
+resulting `oparg` to fit in 32 bits, but the interpreter does not check this.
+
+In the following, a `code unit` is always two bytes, while an `instruction` is a
+sequence of code units consisting of zero to three `EXTENDED_ARG` opcodes followed by
+a primary opcode.
+
+The following loop, to be inserted just above the `switch` statement, will make the above
+snippet decode a complete instruction:
+
+```c
+    while (opcode == EXTENDED_ARG) {
+        word = *next_instr++;
+        opcode = _Py_OPCODE(word);
+        oparg = (oparg << 8) | _Py_OPARG(word);
+    }
+```
+
+For various reasons we'll get to later (mostly efficiency, given that `EXTENDED_ARG`
+is rare) the actual code is different.
+
+Jumps
+=====
+
+Note that when the `switch` statement is reached, `next_instr` (the "instruction offset")
+already points to the next instruction.
+Thus, jump instructions can be implemented by manipulating `next_instr`:
+
+- A jump forward (`JUMP_FORWARD`) sets `next_instr += oparg`.
+- A jump backward sets `next_instr -= oparg`.
+
+Inline cache entries
+====================
+
+Some (specialized or specializable) instructions have an associated "inline cache".
+The inline cache consists of one or more two-byte entries included in the bytecode
+array as additional words following the `opcode`/`oparg` pair.
+The size of the inline cache for a particular instruction is fixed by its `opcode`.
+Moreover, the inline cache size for all instructions in a
+[family of specialized/specializable instructions](adaptive.md)
+(for example, `LOAD_ATTR`, `LOAD_ATTR_SLOT`, `LOAD_ATTR_MODULE`) must all be
+the same.  Cache entries are reserved by the compiler and initialized with zeros.
+Although they are represented by code units, cache entries do not conform to the
+`opcode` / `oparg` format.
+
+If an instruction has an inline cache, the layout of its cache is described by
+a `struct` definition in (`pycore_code.h`)[../Include/internal/pycore_code.h].
+This allows us to access the cache by casting `next_instr` to a pointer to this `struct`.
+The size of such a `struct` must be independent of the machine architecture, word size
+and alignment requirements.  For a 32-bit field, the `struct` should use `_Py_CODEUNIT field[2]`.
+
+The instruction implementation is responsible for advancing `next_instr` past the inline cache.
+For example, if an instruction's inline cache is four bytes (that is, two code units) in size,
+the code for the instruction must contain `next_instr += 2;`.
+This is equivalent to a relative forward jump by that many code units.
+(In the interpreter definition DSL, this is coded as `JUMPBY(n)`, where `n` is the number
+of code units to jump, typically given as a named constant.)
+
+Serializing non-zero cache entries would present a problem because the serialization
+(:mod:`marshal`) format must be independent of the machine byte order.
+
+More information about the use of inline caches can be found in
+[PEP 659](https://peps.python.org/pep-0659/#ancillary-data).
+
+The evaluation stack
+--------------------
+
+Most instructions read or write some data in the form of object references (`PyObject *`).
+The CPython bytecode interpreter is a stack machine, meaning that its instructions operate
+by pushing data onto and popping it off the stack.
+The stack is forms part of the frame for the code object. Its maximum depth is calculated
+by the compiler and stored in the `co_stacksize` field of the code object, so that the
+stack can be pre-allocated is a contiguous array of `PyObject*` pointers, when the frame
+is created.
+
+The stack effects of each instruction are also exposed through the
+[opcode metadata](../Include/internal/pycore_opcode_metadata.h) through two
+functions that report how many stack elements the instructions consumes,
+and how many it produces (`_PyOpcode_num_popped` and `_PyOpcode_num_pushed`).
+For example, the `BINARY_OP` instruction pops two objects from the stack and pushes the
+result back onto the stack.
+
+The stack grows up in memory; the operation `PUSH(x)` is equivalent to `*stack_pointer++ = x`,
+whereas `x = POP()` means `x = *--stack_pointer`.
+Overflow and underflow checks are active in debug mode, but are otherwise optimized away.
+
+At any point during execution, the stack level is knowable based on the instruction pointer
+alone, and some properties of each item on the stack are also known.
+In particular, only a few instructions may push a `NULL` onto the stack, and the positions
+that may be `NULL` are known.
+A few other instructions (`GET_ITER`, `FOR_ITER`) push or pop an object that is known to
+be an iterator.
+
+Instruction sequences that do not allow statically knowing the stack depth are deemed illegal;
+the bytecode compiler never generates such sequences.
+For example, the following sequence is illegal, because it keeps pushing items on the stack:
+
+    LOAD_FAST 0
+    JUMP_BACKWARD 2
+
+> [!NOTE]
+> Do not confuse the evaluation stack with the call stack, which is used to implement calling
+> and returning from functions.
+
+Error handling
+--------------
+
+When the implementation of an opcode raises an exception, it jumps to the
+`exception_unwind` label in [Python/ceval.c](../Python/ceval.c).
+The exception is then handled as described in the
+[`exception handling documentation`](exception_handling.md#handling-exceptions).
+
+Python-to-Python calls
+----------------------
+
+The `_PyEval_EvalFrameDefault()` function is recursive, because sometimes
+the interpreter calls some C function that calls back into the interpreter.
+In 3.10 and before, this was the case even when a Python function called
+another Python function:
+The `CALL` opcode would call the `tp_call` dispatch function of the
+callee, which would extract the code object, create a new frame for the call
+stack, and then call back into the interpreter.  This approach is very general
+but consumes several C stack frames for each nested Python call, thereby
+increasing the risk of an (unrecoverable) C stack overflow.
+
+Since 3.11, the `CALL` instruction special-cases function objects to "inline"
+the call.  When a call gets inlined, a new frame gets pushed onto the call
+stack and the interpreter "jumps" to the start of the callee's bytecode.
+When an inlined callee executes a `RETURN_VALUE` instruction, the frame is
+popped off the call stack and the interpreter returns to its caller,
+by popping a frame off the call stack and "jumping" to the return address.
+There is a flag in the frame (`frame->is_entry`) that indicates whether
+the frame was inlined (set if it wasn't).
+If `RETURN_VALUE` finds this flag set, it performs the usual cleanup and
+returns from `_PyEval_EvalFrameDefault()` altogether, to a C caller.
+
+A similar check is performed when an unhandled exception occurs.
+
+The call stack
+--------------
+
+Up through 3.10, the call stack was implemented as a singly-linked list of
+[frame objects](frames.md). This was expensive because each call would require a
+heap allocation for the stack frame.
+
+Since 3.11, frames are no longer fully-fledged objects. Instead, a leaner internal
+`_PyInterpreterFrame` structure is used, which is allocated using a custom allocator
+function (`_PyThreadState_BumpFramePointer()`), which allocates and initializes a
+frame structure. Usually a frame allocation is just a pointer bump, which improves
+memory locality.
+
+Sometimes an actual `PyFrameObject` is needed, such as when Python code calls
+`sys._getframe()` or an extension module calls
+[`PyEval_GetFrame()`](https://docs.python.org/3/c-api/reflection.html#c.PyEval_GetFrame).
+In this case we allocate a proper `PyFrameObject` and initialize it from the
+`_PyInterpreterFrame`.
+
+Things get more complicated when generators are involved, since those do not
+follow the push/pop model. This includes async functions, which are based on
+the same mechanism.  A generator object has space for a `_PyInterpreterFrame`
+structure, including the variable-size part (used for locals and the eval stack).
+When a generator (or async) function is first called, a special opcode
+`RETURN_GENERATOR` is executed, which is responsible for creating the
+generator object.  The generator object's `_PyInterpreterFrame` is initialized
+with a copy of the current stack frame.  The current stack frame is then popped
+off the frame stack and the generator object is returned.
+(Details differ depending on the `is_entry` flag.)
+When the generator is resumed, the interpreter pushes its `_PyInterpreterFrame`
+onto the frame stack and resumes execution.
+See also the [generators](generators.md) section.
+
+<!--
+
+All sorts of variables
+----------------------
+
+The bytecode compiler determines the scope in which each variable name is defined,
+and generates instructions accordingly.  For example, loading a local variable
+onto the stack is done using `LOAD_FAST`, while loading a global is done using
+`LOAD_GLOBAL`.
+The key types of variables are:
+
+- fast locals: used in functions
+- (slow or regular) locals: used in classes and at the top level
+- globals and builtins: the compiler cannot distinguish between globals and
+builtins (though at runtime, the specializing interpreter can)
+- cells: used for nonlocal references
+
+(TODO: Write the rest of this section. Alas, the author got distracted and won't have time to continue this for a while.)
+
+-->
+
+<!--
+
+Other topics
+------------
+
+(TODO: Each of the following probably deserves its own section.)
+
+- co_consts, co_names, co_varnames, and their ilk
+- How calls work (how args are transferred, return, exceptions)
+- Eval breaker (interrupts, GIL)
+- Tracing
+- Setting the current lineno (debugger-induced jumps)
+- Specialization, inline caches etc.
+
+-->
+
+Introducing a new bytecode instruction
+--------------------------------------
+
+It is occasionally necessary to add a new opcode in order to implement
+a new feature or change the way that existing features are compiled.
+This section describes the changes required to do this.
+
+First, you must choose a name for the bytecode, implement it in
+[`Python/bytecodes.c`](../Python/bytecodes.c) and add a documentation
+entry in [`Doc/library/dis.rst`](../Doc/library/dis.rst).
+Then run `make regen-cases` to assign a number for it (see
+[`Include/opcode_ids.h`](../Include/opcode_ids.h)) and regenerate a
+number of files with the actual implementation of the bytecode in
+[`Python/generated_cases.c.h`](../Python/generated_cases.c.h) and
+metadata about it in additional files.
+
+With a new bytecode you must also change what is called the "magic number" for
+.pyc files: bump the value of the variable `MAGIC_NUMBER` in
+[`Lib/importlib/_bootstrap_external.py`](../Lib/importlib/_bootstrap_external.py).
+Changing this number will lead to all .pyc files with the old `MAGIC_NUMBER`
+to be recompiled by the interpreter on import.  Whenever `MAGIC_NUMBER` is
+changed, the ranges in the `magic_values` array in
+[`PC/launcher.c`](../PC/launcher.c) may also need to be updated.  Changes to
+[`Lib/importlib/_bootstrap_external.py`](../Lib/importlib/_bootstrap_external.py)
+will take effect only after running `make regen-importlib`.
+
+> [!NOTE]
+> Running `make regen-importlib` before adding the new bytecode target to
+> [`Python/bytecodes.c`](../Python/bytecodes.c)
+> (followed by `make regen-cases`) will result in an error. You should only run
+> `make regen-importlib` after the new bytecode target has been added.
+
+> [!NOTE]
+> On Windows, running the `./build.bat` script will automatically
+> regenerate the required files without requiring additional arguments.
+
+Finally, you need to introduce the use of the new bytecode.  Update
+[`Python/codegen.c`](../Python/codegen.c) to emit code with this bytecode.
+Optimizations in [`Python/flowgraph.c`](../Python/flowgraph.c) may also
+need to be updated.  If the new opcode affects a control flow or the block
+stack, you may have to update the `frame_setlineno()` function in
+[`Objects/frameobject.c`](../Objects/frameobject.c).  It may also be necessary
+to update [`Lib/dis.py`](../Lib/dis.py) if the new opcode interprets its
+argument in a special way (like `FORMAT_VALUE` or `MAKE_FUNCTION`).
+
+If you make a change here that can affect the output of bytecode that
+is already in existence and you do not change the magic number, make
+sure to delete your old .py(c|o) files!  Even though you will end up changing
+the magic number if you change the bytecode, while you are debugging your work
+you may be changing the bytecode output without constantly bumping up the
+magic number.  This can leave you with stale .pyc files that will not be
+recreated.
+Running `find . -name '*.py[co]' -exec rm -f '{}' +` should delete all .pyc
+files you have, forcing new ones to be created and thus allow you test out your
+new bytecode properly.  Run `make regen-importlib` for updating the
+bytecode of frozen importlib files.  You have to run `make` again after this
+to recompile the generated C files.
+
+Additional resources
+--------------------
+
+* Brandt Bucher's talk about the specializing interpreter at PyCon US 2023.
+  [Slides](https://github.com/brandtbucher/brandtbucher/blob/master/2023/04/21/inside_cpython_311s_new_specializing_adaptive_interpreter.pdf)
+  [Video](https://www.youtube.com/watch?v=PGZPSWZSkJI&t=1470s)