in gc_refs, even at the cost of putting back a test+branch in
visit_decref.
The good news: since gc_refs became utterly tame then, it became
clear that another special value could be useful. The move_roots() and
move_root_reachable() passes have now been replaced by a single
move_unreachable() pass. Besides saving a pass over the generation, this
has a better effect: most of the time everything turns out to be
reachable, so we were breaking the generation list apart and moving it
into into the reachable list, one element at a time. Now the reachable
stuff stays in the generation list, and the unreachable stuff is moved
instead. This isn't quite as good as it sounds, since sometimes we
guess wrongly that a thing is unreachable, and have to move it back again.
Still, overall, it yields a significant (but not dramatic) boost in
collection speed.
1. You're not supposed to call this with a NULL argument, although the
docs could be clearer about that. The other visit_XYZ() functions
don't bother to check. This doesn't either now, although it does
assert non-NULL-ness now.
2. It doesn't matter whether the object is currently tracked, so don't
bother checking that either (if it isn't currently tracked, it may
have some nonsense value in gc_refs, but it doesn't hurt to
decrement gibberish, and it's cheaper to do so than to make everyone
test for trackedness).
It would be nice to get rid of the other tests on IS_TRACKED. Perhaps
trackedness should not be a matter of not being in any gc list, but
should be a matter of being in a new "untracked" gc list. This list
simply wouldn't be involved in the collection mechanism. A newly
created object would be put in the untracked list. Tracking would
simply unlink it and move it into the gen0 list. Untracking would do
the reverse. No test+branch needed then. visit_move() may be vulnerable
then, though, and I don't know how this would work with the trashcan.
"The regression" is actually due to that 2.2.1 had a bug that prevented
the regression (which isn't a regression at all) from showing up. "The
regression" is actually a glitch in cyclic gc that's been there forever.
As the generation being collected is analyzed, objects that can't be
collected (because, e.g., we find they're externally referenced, or
are in an unreachable cycle but have a __del__ method) are moved out
of the list of candidates. A tricksy scheme uses negative values of
gc_refs to mark such objects as being moved. However, the exact
negative value set at the start may become "more negative" over time
for objects not in the generation being collected, and the scheme was
checking for an exact match on the negative value originally assigned.
As a result, objects in generations older than the one being collected
could get scanned too, and yanked back into a younger generation. Doing
so doesn't lead to an error, but doesn't do any good, and can burn an
unbounded amount of time doing useless work.
A test case is simple (thanks to Kevin Jacobs for finding it!):
x = []
for i in xrange(200000):
x.append((1,))
Without the patch, this ends up scanning all of x on every gen0 collection,
scans all of x twice on every gen1 collection, and x gets yanked back into
gen1 on every gen0 collection. With the patch, once x gets to gen2, it's
never scanned again until another gen2 collection, and stays in gen2.
Bugfix candidate, although the code has changed enough that I think I'll
need to port it by hand. 2.2.1 also has a different bug that causes
bound method objects not to get tracked at all (so the test case doesn't
burn absurd amounts of time in 2.2.1, but *should* <wink>).
Setting the buffer_text attribute to true causes the parser to collect
character data, waiting as long as possible to report it to the Python
callback. This can save an enormous number of callbacks from C to
Python, which can be a substantial performance improvement.
buffer_text defaults to false.
The handlers array on each parser now has the invariant that None will
never be set as a handler; it will always be NULL or a Python-level
value passed in for the specific handler.
have_handler(): Return true if there is a Python handler for a
particular event.
get_handler_name(): Return a string object giving the name of a
particular handler. This caches the string object so it doesn't
need to be created more than once.
get_parse_result(): Helper to allow the Parse() and ParseFile()
methods to share the same logic for determining the return value
or exception state.
PyUnknownEncodingHandler(), PyModule_AddIntConstant():
Made these helpers static. (The later is only defined for older
versions of Python.)
pyxml_UpdatePairedHandlers(), pyxml_SetStartElementHandler(),
pyxml_SetEndElementHandler(), pyxml_SetStartNamespaceDeclHandler(),
pyxml_SetEndNamespaceDeclHandler(), pyxml_SetStartCdataSection(),
pyxml_SetEndCdataSection(), pyxml_SetStartDoctypeDeclHandler(),
pyxml_SetEndDoctypeDeclHandler():
Removed. These are no longer needed with Expat 1.95.x.
handler_info:
Use the setter functions provided by Expat 1.95.x instead of the
pyxml_Set*Handler() functions which have been removed.
Minor code formatting changes for consistency.
Trailing whitespace removed.
- Include a blank line between the signature line and the description
(Guido sez).
- Don't include "-> None" for API functions that always return None
because they don't have a meaningful return value.
These built-in functions are replaced by their (now callable) type:
slice()
buffer()
and these types can also be called (but have no built-in named
function named after them)
classobj (type name used to be "class")
code
function
instance
instancemethod (type name used to be "instance method")
The module "new" has been replaced with a small backward compatibility
placeholder in Python.
A large portion of the patch simply removes the new module from
various platform-specific build recipes. The following binary Mac
project files still have references to it:
Mac/Build/PythonCore.mcp
Mac/Build/PythonStandSmall.mcp
Mac/Build/PythonStandalone.mcp
[I've tweaked the code layout and the doc strings here and there, and
added a comment to types.py about StringTypes vs. basestring. --Guido]
library. Since multiple versions can be installed simultaneously, it's
crucial that you only select libraries and header files which are compatible
with each other. Version checking is done from highest version to lowest.
Building using version 1 of Berkeley DB is disabled by default because of
the hash file bugs people keep rediscovering. It can be enabled by
uncommenting a few lines in setup.py. Closes patch 553108.
that retries the connect() call in timeout mode so it can be shared
between connect() and connect_ex(), and needs only a single #ifdef.
The test for this was doing funky stuff I don't approve of,
so I removed it in favor of a simpler test. This allowed me
to implement a simpler, "purer" form of the timeout retry code.
Hopefully that's enough (if you want to be fancy, use non-blocking
mode and decode the errors yourself, like before).
- setblocking(0) and settimeout(0) are now equivalent, and ditto for
setblocking(1) and settimeout(None).
- Don't raise an exception from internal_select(); let the final call
report the error (this means you will get an EAGAIN error instead of
an ETIMEDOUT error -- I don't care).
- Move the select to inside the Py_{BEGIN,END}_ALLOW_THREADS brackets,
so other theads can run (this was a bug in the original code).
- Redid the retry logic in connect() and connect_ex() to avoid masking
errors. This probably doesn't work for Windows yet; I'll fix that
next. It may also fail on other platforms, depending on what
retrying a connect does; I need help with this.
- Get rid of the retry logic in accept(). I don't think it was needed
at all. But I may be wrong.
settimeout(). Already, settimeout() canceled non-blocking mode; now,
setblocking() also cancels the timeout. This is easier to document.
(XXX should settimeout(0) be an alias for setblocking(0)? They seem
to have roughly the same effect. Also, I'm not sure that the code in
connect() and accept() is correct in all cases. We'll sort this out
soon enough.)
not testing it -- apparently test_timeout.py doesn't test anything
useful):
In internal_select():
- The tv_usec part of the timeout for select() was calculated wrong.
- The first argument to select() was one too low.
- The sense of the direction argument to internal_select() was
inverted.
In PySocketSock_settimeout():
- The calls to internal_setblocking() were swapped.
Also, repaired some comments and fixed the test for the return value
of internal_select() in sendall -- this was in the original patch.
I've made considerable changes to Michael's code, specifically to use
the select() system call directly and to store the timeout as a C
double instead of a Python object; internally, -1.0 (or anything
negative) represents the None from the API.
I'm not 100% sure that all corner cases are covered correctly, so
please keep an eye on this. Next I'm going to try it Windows before
Tim complains.
No way is this a bugfix candidate. :-)
readline in all python versions is configured
to append a 'space' character for a successful
completion. But for almost all python expressions
'space' is not wanted (see coding conventions PEP 8).
For example if you have a function 'longfunction'
and you type 'longf<TAB>' you get 'longfunction '
as a completion. note the unwanted space at the
end.
The patch fixes this behaviour by setting readline's
append_character to '\0' which means don't append
anything. This doesn't work with readline < 2.1
(AFAIK nowadays readline2.2 is in good use).
An alternative approach would be to make the
append_character
accessable from python so that modules like
the rlcompleter.py can set it to '\0'.
[Ed.: I think expecting readline >= 2.2 is fine. If a completer wants
another character they can append that to the keyword in the list.]
[ 559250 ] more POSIX signal stuff
Adds support (and docs and tests and autoconfery) for posix signal
mask handling -- sigpending, sigprocmask and sigsuspend.