From 695814c6e97aad0ae2b116cedca3e77d25d5b968 Mon Sep 17 00:00:00 2001 From: Irit Katriel <1055913+iritkatriel@users.noreply.github.com> Date: Mon, 21 Oct 2024 18:54:24 +0100 Subject: [PATCH] gh-119786: move interpreter doc from devguide to InternalDocs (#125715) --- InternalDocs/README.md | 31 ++- InternalDocs/code_objects.md | 5 + InternalDocs/generators.md | 9 + InternalDocs/interpreter.md | 364 +++++++++++++++++++++++++++++++++++ 4 files changed, 400 insertions(+), 9 deletions(-) create mode 100644 InternalDocs/code_objects.md create mode 100644 InternalDocs/generators.md create mode 100644 InternalDocs/interpreter.md diff --git a/InternalDocs/README.md b/InternalDocs/README.md index 0a6ecf89945..48c893bde2a 100644 --- a/InternalDocs/README.md +++ b/InternalDocs/README.md @@ -11,19 +11,32 @@ The core dev team attempts to keep this documentation up to date. If it is not, please report that through the [issue tracker](https://github.com/python/cpython/issues). -Index: ------ -[Guide to the parser](parser.md) +Compiling Python Source Code +--- -[Compiler Design](compiler.md) +- [Guide to the parser](parser.md) -[Frames](frames.md) +- [Compiler Design](compiler.md) -[Adaptive Instruction Families](adaptive.md) +Runtime Objects +--- -[The Source Code Locations Table](locations.md) +- [Code Objects (coming soon)](code_objects.md) -[Garbage collector design](garbage_collector.md) +- [The Source Code Locations Table](locations.md) -[Exception Handling](exception_handling.md) +- [Generators (coming soon)](generators.md) + +- [Frames](frames.md) + +Program Execution +--- + +- [The Interpreter](interpreter.md) + +- [Adaptive Instruction Families](adaptive.md) + +- [Garbage Collector Design](garbage_collector.md) + +- [Exception Handling](exception_handling.md) diff --git a/InternalDocs/code_objects.md b/InternalDocs/code_objects.md new file mode 100644 index 00000000000..284a8b7aee5 --- /dev/null +++ b/InternalDocs/code_objects.md @@ -0,0 +1,5 @@ + +Code objects +============ + +Coming soon. diff --git a/InternalDocs/generators.md b/InternalDocs/generators.md new file mode 100644 index 00000000000..d53f0f9bdff --- /dev/null +++ b/InternalDocs/generators.md @@ -0,0 +1,9 @@ + +Generators +========== + +Coming soon. + + diff --git a/InternalDocs/interpreter.md b/InternalDocs/interpreter.md new file mode 100644 index 00000000000..dcfddc99370 --- /dev/null +++ b/InternalDocs/interpreter.md @@ -0,0 +1,364 @@ + +The bytecode interpreter +======================== + +Overview +-------- + +This document describes the workings and implementation of the bytecode +interpreter, the part of python that executes compiled Python code. Its +entry point is in [Python/ceval.c](../Python/ceval.c). + +At a high level, the interpreter consists of a loop that iterates over the +bytecode instructions, executing each of them via a switch statement that +has a case implementing each opcode. This switch statement is generated +from the instruction definitions in [Python/bytecodes.c](../Python/bytecodes.c) +which are written in [a DSL](../Tools/cases_generator/interpreter_definition.md) +developed for this purpose. + +Recall that the [Python Compiler](compiler.md) produces a [`CodeObject`](code_object.md), +which contains the bytecode instructions along with static data that is required to execute them, +such as the consts list, variable names, +[exception table](exception_handling.md#format-of-the-exception-table), and so on. + +When the interpreter's +[`PyEval_EvalCode()`](https://docs.python.org/3.14/c-api/veryhigh.html#c.PyEval_EvalCode) +function is called to execute a `CodeObject`, it constructs a [`Frame`](frames.md) and calls +[`_PyEval_EvalFrame()`](https://docs.python.org/3.14/c-api/veryhigh.html#c.PyEval_EvalCode) +to execute the code object in this frame. The frame hold the dynamic state of the +`CodeObject`'s execution, including the instruction pointer, the globals and builtins. +It also has a reference to the `CodeObject` itself. + +In addition to the frame, `_PyEval_EvalFrame()` also receives a +[`Thread State`](https://docs.python.org/3/c-api/init.html#c.PyThreadState) +object, `tstate`, which includes things like the exception state and the +recursion depth. The thread state also provides access to the per-interpreter +state (`tstate->interp`), which has a pointer to the per-runtime (that is, +truly global) state (`tstate->interp->runtime`). + +Finally, `_PyEval_EvalFrame()` receives an integer argument `throwflag` +which, when nonzero, indicates that the interpreter should just raise the current exception +(this is used in the implementation of +[`gen.throw`](https://docs.python.org/3.14/reference/expressions.html#generator.throw). + +By default, [`_PyEval_EvalFrame()`](https://docs.python.org/3.14/c-api/veryhigh.html#c.PyEval_EvalCode) +simply calls [`_PyEval_EvalFrameDefault()`] to execute the frame. However, as per +[`PEP 523`](https://peps.python.org/pep-0523/) this is configurable by setting +`interp->eval_frame`. In the following, we describe the default function, +`_PyEval_EvalFrameDefault()`. + + +Instruction decoding +-------------------- + +The first task of the interpreter is to decode the bytecode instructions. +Bytecode is stored as an array of 16-bit code units (`_Py_CODEUNIT`). +Each code unit contains an 8-bit `opcode` and an 8-bit argument (`oparg`), both unsigned. +In order to make the bytecode format independent of the machine byte order when stored on disk, +`opcode` is always the first byte and `oparg` is always the second byte. +Macros are used to extract the `opcode` and `oparg` from a code unit +(`_Py_OPCODE(word)` and `_Py_OPARG(word)`). +Some instructions (for example, `NOP` or `POP_TOP`) have no argument -- in this case +we ignore `oparg`. + +A simplified version of the interpreter's main loop looks like this: + +```c + _Py_CODEUNIT *first_instr = code->co_code_adaptive; + _Py_CODEUNIT *next_instr = first_instr; + while (1) { + _Py_CODEUNIT word = *next_instr++; + unsigned char opcode = _Py_OPCODE(word); + unsigned int oparg = _Py_OPARG(word); + switch (opcode) { + // ... A case for each opcode ... + } + } +``` + +This loop iterates over the instructions, decoding each into its `opcode` +and `oparg`, and then executes the switch case that implements this `opcode`. + +The instruction format supports 256 different opcodes, which is sufficient. +However, it also limits `oparg` to 8-bit values, which is too restrictive. +To overcome this, the `EXTENDED_ARG` opcode allows us to prefix any instruction +with one or more additional data bytes, which combine into a larger oparg. +For example, this sequence of code units: + + EXTENDED_ARG 1 + EXTENDED_ARG 0 + LOAD_CONST 2 + +would set `opcode` to `LOAD_CONST` and `oparg` to `65538` (that is, `0x1_00_02`). +The compiler should limit itself to at most three `EXTENDED_ARG` prefixes, to allow the +resulting `oparg` to fit in 32 bits, but the interpreter does not check this. + +In the following, a `code unit` is always two bytes, while an `instruction` is a +sequence of code units consisting of zero to three `EXTENDED_ARG` opcodes followed by +a primary opcode. + +The following loop, to be inserted just above the `switch` statement, will make the above +snippet decode a complete instruction: + +```c + while (opcode == EXTENDED_ARG) { + word = *next_instr++; + opcode = _Py_OPCODE(word); + oparg = (oparg << 8) | _Py_OPARG(word); + } +``` + +For various reasons we'll get to later (mostly efficiency, given that `EXTENDED_ARG` +is rare) the actual code is different. + +Jumps +===== + +Note that when the `switch` statement is reached, `next_instr` (the "instruction offset") +already points to the next instruction. +Thus, jump instructions can be implemented by manipulating `next_instr`: + +- A jump forward (`JUMP_FORWARD`) sets `next_instr += oparg`. +- A jump backward sets `next_instr -= oparg`. + +Inline cache entries +==================== + +Some (specialized or specializable) instructions have an associated "inline cache". +The inline cache consists of one or more two-byte entries included in the bytecode +array as additional words following the `opcode`/`oparg` pair. +The size of the inline cache for a particular instruction is fixed by its `opcode`. +Moreover, the inline cache size for all instructions in a +[family of specialized/specializable instructions](adaptive.md) +(for example, `LOAD_ATTR`, `LOAD_ATTR_SLOT`, `LOAD_ATTR_MODULE`) must all be +the same. Cache entries are reserved by the compiler and initialized with zeros. +Although they are represented by code units, cache entries do not conform to the +`opcode` / `oparg` format. + +If an instruction has an inline cache, the layout of its cache is described by +a `struct` definition in (`pycore_code.h`)[../Include/internal/pycore_code.h]. +This allows us to access the cache by casting `next_instr` to a pointer to this `struct`. +The size of such a `struct` must be independent of the machine architecture, word size +and alignment requirements. For a 32-bit field, the `struct` should use `_Py_CODEUNIT field[2]`. + +The instruction implementation is responsible for advancing `next_instr` past the inline cache. +For example, if an instruction's inline cache is four bytes (that is, two code units) in size, +the code for the instruction must contain `next_instr += 2;`. +This is equivalent to a relative forward jump by that many code units. +(In the interpreter definition DSL, this is coded as `JUMPBY(n)`, where `n` is the number +of code units to jump, typically given as a named constant.) + +Serializing non-zero cache entries would present a problem because the serialization +(:mod:`marshal`) format must be independent of the machine byte order. + +More information about the use of inline caches can be found in +[PEP 659](https://peps.python.org/pep-0659/#ancillary-data). + +The evaluation stack +-------------------- + +Most instructions read or write some data in the form of object references (`PyObject *`). +The CPython bytecode interpreter is a stack machine, meaning that its instructions operate +by pushing data onto and popping it off the stack. +The stack is forms part of the frame for the code object. Its maximum depth is calculated +by the compiler and stored in the `co_stacksize` field of the code object, so that the +stack can be pre-allocated is a contiguous array of `PyObject*` pointers, when the frame +is created. + +The stack effects of each instruction are also exposed through the +[opcode metadata](../Include/internal/pycore_opcode_metadata.h) through two +functions that report how many stack elements the instructions consumes, +and how many it produces (`_PyOpcode_num_popped` and `_PyOpcode_num_pushed`). +For example, the `BINARY_OP` instruction pops two objects from the stack and pushes the +result back onto the stack. + +The stack grows up in memory; the operation `PUSH(x)` is equivalent to `*stack_pointer++ = x`, +whereas `x = POP()` means `x = *--stack_pointer`. +Overflow and underflow checks are active in debug mode, but are otherwise optimized away. + +At any point during execution, the stack level is knowable based on the instruction pointer +alone, and some properties of each item on the stack are also known. +In particular, only a few instructions may push a `NULL` onto the stack, and the positions +that may be `NULL` are known. +A few other instructions (`GET_ITER`, `FOR_ITER`) push or pop an object that is known to +be an iterator. + +Instruction sequences that do not allow statically knowing the stack depth are deemed illegal; +the bytecode compiler never generates such sequences. +For example, the following sequence is illegal, because it keeps pushing items on the stack: + + LOAD_FAST 0 + JUMP_BACKWARD 2 + +> [!NOTE] +> Do not confuse the evaluation stack with the call stack, which is used to implement calling +> and returning from functions. + +Error handling +-------------- + +When the implementation of an opcode raises an exception, it jumps to the +`exception_unwind` label in [Python/ceval.c](../Python/ceval.c). +The exception is then handled as described in the +[`exception handling documentation`](exception_handling.md#handling-exceptions). + +Python-to-Python calls +---------------------- + +The `_PyEval_EvalFrameDefault()` function is recursive, because sometimes +the interpreter calls some C function that calls back into the interpreter. +In 3.10 and before, this was the case even when a Python function called +another Python function: +The `CALL` opcode would call the `tp_call` dispatch function of the +callee, which would extract the code object, create a new frame for the call +stack, and then call back into the interpreter. This approach is very general +but consumes several C stack frames for each nested Python call, thereby +increasing the risk of an (unrecoverable) C stack overflow. + +Since 3.11, the `CALL` instruction special-cases function objects to "inline" +the call. When a call gets inlined, a new frame gets pushed onto the call +stack and the interpreter "jumps" to the start of the callee's bytecode. +When an inlined callee executes a `RETURN_VALUE` instruction, the frame is +popped off the call stack and the interpreter returns to its caller, +by popping a frame off the call stack and "jumping" to the return address. +There is a flag in the frame (`frame->is_entry`) that indicates whether +the frame was inlined (set if it wasn't). +If `RETURN_VALUE` finds this flag set, it performs the usual cleanup and +returns from `_PyEval_EvalFrameDefault()` altogether, to a C caller. + +A similar check is performed when an unhandled exception occurs. + +The call stack +-------------- + +Up through 3.10, the call stack was implemented as a singly-linked list of +[frame objects](frames.md). This was expensive because each call would require a +heap allocation for the stack frame. + +Since 3.11, frames are no longer fully-fledged objects. Instead, a leaner internal +`_PyInterpreterFrame` structure is used, which is allocated using a custom allocator +function (`_PyThreadState_BumpFramePointer()`), which allocates and initializes a +frame structure. Usually a frame allocation is just a pointer bump, which improves +memory locality. + +Sometimes an actual `PyFrameObject` is needed, such as when Python code calls +`sys._getframe()` or an extension module calls +[`PyEval_GetFrame()`](https://docs.python.org/3/c-api/reflection.html#c.PyEval_GetFrame). +In this case we allocate a proper `PyFrameObject` and initialize it from the +`_PyInterpreterFrame`. + +Things get more complicated when generators are involved, since those do not +follow the push/pop model. This includes async functions, which are based on +the same mechanism. A generator object has space for a `_PyInterpreterFrame` +structure, including the variable-size part (used for locals and the eval stack). +When a generator (or async) function is first called, a special opcode +`RETURN_GENERATOR` is executed, which is responsible for creating the +generator object. The generator object's `_PyInterpreterFrame` is initialized +with a copy of the current stack frame. The current stack frame is then popped +off the frame stack and the generator object is returned. +(Details differ depending on the `is_entry` flag.) +When the generator is resumed, the interpreter pushes its `_PyInterpreterFrame` +onto the frame stack and resumes execution. +See also the [generators](generators.md) section. + + + + + +Introducing a new bytecode instruction +-------------------------------------- + +It is occasionally necessary to add a new opcode in order to implement +a new feature or change the way that existing features are compiled. +This section describes the changes required to do this. + +First, you must choose a name for the bytecode, implement it in +[`Python/bytecodes.c`](../Python/bytecodes.c) and add a documentation +entry in [`Doc/library/dis.rst`](../Doc/library/dis.rst). +Then run `make regen-cases` to assign a number for it (see +[`Include/opcode_ids.h`](../Include/opcode_ids.h)) and regenerate a +number of files with the actual implementation of the bytecode in +[`Python/generated_cases.c.h`](../Python/generated_cases.c.h) and +metadata about it in additional files. + +With a new bytecode you must also change what is called the "magic number" for +.pyc files: bump the value of the variable `MAGIC_NUMBER` in +[`Lib/importlib/_bootstrap_external.py`](../Lib/importlib/_bootstrap_external.py). +Changing this number will lead to all .pyc files with the old `MAGIC_NUMBER` +to be recompiled by the interpreter on import. Whenever `MAGIC_NUMBER` is +changed, the ranges in the `magic_values` array in +[`PC/launcher.c`](../PC/launcher.c) may also need to be updated. Changes to +[`Lib/importlib/_bootstrap_external.py`](../Lib/importlib/_bootstrap_external.py) +will take effect only after running `make regen-importlib`. + +> [!NOTE] +> Running `make regen-importlib` before adding the new bytecode target to +> [`Python/bytecodes.c`](../Python/bytecodes.c) +> (followed by `make regen-cases`) will result in an error. You should only run +> `make regen-importlib` after the new bytecode target has been added. + +> [!NOTE] +> On Windows, running the `./build.bat` script will automatically +> regenerate the required files without requiring additional arguments. + +Finally, you need to introduce the use of the new bytecode. Update +[`Python/codegen.c`](../Python/codegen.c) to emit code with this bytecode. +Optimizations in [`Python/flowgraph.c`](../Python/flowgraph.c) may also +need to be updated. If the new opcode affects a control flow or the block +stack, you may have to update the `frame_setlineno()` function in +[`Objects/frameobject.c`](../Objects/frameobject.c). It may also be necessary +to update [`Lib/dis.py`](../Lib/dis.py) if the new opcode interprets its +argument in a special way (like `FORMAT_VALUE` or `MAKE_FUNCTION`). + +If you make a change here that can affect the output of bytecode that +is already in existence and you do not change the magic number, make +sure to delete your old .py(c|o) files! Even though you will end up changing +the magic number if you change the bytecode, while you are debugging your work +you may be changing the bytecode output without constantly bumping up the +magic number. This can leave you with stale .pyc files that will not be +recreated. +Running `find . -name '*.py[co]' -exec rm -f '{}' +` should delete all .pyc +files you have, forcing new ones to be created and thus allow you test out your +new bytecode properly. Run `make regen-importlib` for updating the +bytecode of frozen importlib files. You have to run `make` again after this +to recompile the generated C files. + +Additional resources +-------------------- + +* Brandt Bucher's talk about the specializing interpreter at PyCon US 2023. + [Slides](https://github.com/brandtbucher/brandtbucher/blob/master/2023/04/21/inside_cpython_311s_new_specializing_adaptive_interpreter.pdf) + [Video](https://www.youtube.com/watch?v=PGZPSWZSkJI&t=1470s)