If SRE(match) function terminates abruptly, either because of a signal
or because memory allocation fails, allocated SRE_REPEAT blocks might
be never released.
(cherry picked from commit 7538e7f569)
gh-120155: Add assertion to sre.c match_getindex() (GH-120402)
Add an assertion to help static analyzers to detect that i*2 cannot
overflow.
(cherry picked from commit 42b25dd61f)
Co-authored-by: Victor Stinner <vstinner@python.org>
Now re.error is raised instead of OverflowError or RuntimeError for
too large width of look-behind pattern.
The limit is increased to 2**32-1 (was 2**31-1).
(cherry picked from commit e2b3d831fd)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
TypeError would be overwritten by OverflowError
if 'code' param contained non-ints.
(cherry picked from commit 344d3a222a)
Co-authored-by: Nikita Sobolev <mail@sobolevn.me>
gh-109631: Allow interruption of short repeated regex matches (GH-109867)
Counting for signal checking now continues in new match from the point where
it ended in the previous match instead of starting from 0.
(cherry picked from commit 8ac2085b80)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Restore the global Input Stream pointer after trying to match a sub-pattern.
.
(cherry picked from commit abd9cc52d9)
Co-authored-by: SKO <41810398+uyw4687@users.noreply.github.com>
Some items remained uninitialized if _sre.template() was called with invalid
indices. Then attempt to clear them in the destructor led to dereferencing
of uninitialized pointer.
(cherry picked from commit 2ef1dc37f0)
Co-authored-by: Radislav Chugunov <52372310+chgnrdv@users.noreply.github.com>
Here we are doing no more than adding the value for Py_mod_multiple_interpreters and using it for stdlib modules. We will start checking for it in gh-104206 (once PyInterpreterState.ceval.own_gil is added in gh-104204).
In very rare circumstances the JUMP opcode could be confused with the
argument of the opcode in the "then" part which doesn't end with the
JUMP opcode. This led to incorrect detection of the final JUMP opcode
and incorrect calculation of the size of the subexpression.
NOTE: Changed return value of functions _validate_inner() and
_validate_charset() in Modules/_sre/sre.c. Now they return 0 on success,
-1 on failure, and 1 if the last op is JUMP (which usually is a failure).
Previously they returned 1 on success and 0 on failure.
Functions re.sub() and re.subn() and corresponding re.Pattern methods
are now 2-3 times faster for replacement strings containing group references.
Closes #91524
Primarily authored by serhiy-storchaka Serhiy Storchaka
Minor-cleanups-by: Gregory P. Smith [Google] <greg@krypto.org>
We only statically initialize for core code and builtin modules. Extension modules still create
the tuple at runtime. We'll solve that part of interpreter isolation separately.
This change includes generated code. The non-generated changes are in:
* Tools/clinic/clinic.py
* Python/getargs.c
* Include/cpython/modsupport.h
* Makefile.pre.in (re-generate global strings after running clinic)
* very minor tweaks to Modules/_codecsmodule.c and Python/Python-tokenize.c
All other changes are generated code (clinic, global strings).
Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283)"
This reverts commit 6e3eee5c11.
Manual fixups to increase the MAGIC number and to handle conflicts with
a couple of changes that landed after that.
Thanks for reviews by Ma Lin and Serhiy Storchaka.
It combines PyImport_ImportModule() and PyObject_GetAttrString()
and saves 4-6 lines of code on every use.
Add also _PyImport_GetModuleAttr() which takes Python strings as arguments.
It was initially added to support atomic groups, but that
support was never fully implemented, and CALL was only left
in the compiler, but not interpreter and parser.
ATOMIC_GROUP is now used to support atomic groups.
Limit the maximum capturing group to 2**30-1 on 64-bit platforms
(it was 2**31-1). No change on 32-bit platforms (2**28-1).
It allows to reduce the size of SRE(match_context):
- On 32 bit platform: 36 bytes, no change. (msvc2022)
- On 64 bit platform: 72 bytes -> 56 bytes. (msvc2022/gcc9.4)
which leads to increasing the depth of backtracking.
* Move the code for generating Modules/_sre/sre_constants.h from
Lib/re/_constants.py into a separate script
Tools/scripts/generate_sre_constants.py.
* Add target `regen-sre` in the makefile.
* Make target `regen-all` depending on `regen-sre`.