* Remove backtracking when parsing tarfile headers
* Rewrite PAX header parsing to be stricter
* Optimize parsing of GNU extended sparse headers v0.0
Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Tomas R <tomas.roun8@gmail.com>
Co-authored-by: Scott Odle <scott@sjodle.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
* Add name and mode attributes for compressed and archived file-like objects
in modules bz2, lzma, tarfile and zipfile.
* Change the value of the mode attribute of GzipFile from integer (1 or 2)
to string ('rb' or 'wb').
* Change the value of the mode attribute of ZipExtFile from 'r' to 'rb'.
Tarfile.addfile now throws an ValueError when the user passes
in a non-zero size tarinfo but does not provide a fileobj,
instead of writing an incomplete entry.
* Increase coverage for compressed file-like objects initialized with a
file name, an open file object, a file object opened by file
descriptor, and a file-like object without name and mode attributes
(io.BytesIO)
* Increase coverage for name, fileno(), mode, readable(), writable(),
seekable() in different modes and states
* No longer skip tests with bytes names
* Test objects implementing the path protocol, not just pathlib.Path.
In the stack call of: _init_read_gz()
```
_read, tarfile.py:548
read, tarfile.py:526
_init_read_gz, tarfile.py:491
```
a try;except exists that uses `self.exception`, so it needs to be set before
calling _init_read_gz().
That causes the test to fail when run using a high UID as that ancient format
cannot represent it. The current default (PAX) and the old default (GNU) both
support high UIDs.
`tarfile` already accepts a compressionlevel argument for creating
files. This patch adds the same for stream-based tarfile usage.
The default is 9, the value that was previously hard-coded.
* ``sys.executable`` is not set
* WASI does not support subprocess
* ``pwd`` module is not available
* WASI checks ``open`` syscall flags more strict, needs r, w, rw flag.
* ``umask`` is not available
* ``/dev/null`` may not be accessible
- fd inheritance can't be modified because Emscripten doesn't support subprocesses anyway.
- setpriority always fails
- geteuid no longer causes problems with latest emsdk
- umask is a stub
- geteuid / getuid always return 0, but process cannot chown to random uid.
Numeric fields of type float, notably mtime, can't be represented
exactly in the ustar header, so the pax header is used. But it is
helpful to set them to the nearest int (i.e. second rather than
nanosecond precision mtimes) in the ustar header as well, for the
benefit of unarchivers that don't understand the pax header.
Add test for tarfile.TarInfo.create_pax_header to confirm correct
behaviour.
* during tarfile parsing, a zlib error indicates invalid data
* tarfile.open now raises a descriptive exception from the zlib error
* this makes it clear to the user that they may be trying to open a
corrupted tar file
tarfile writes full path to FNAME field of GZIP format instead of just basename if user specified absolute path. Some archive viewers may process file incorrectly. Also it creates security issue because anyone can know structure of directories on system and know username or other personal information.
RFC1952 says about FNAME:
This is the original name of the file being compressed, with any directory components removed.
So tarfile must remove directory names from FNAME and write only basename of file.
Automerge-Triggered-By: @jaraco
Make the the following imports lazy in test.support:
* bz2
* gzip
* lzma
* resource
* zlib
The following test.support decorators now need to be called
with parenthesis:
* @support.requires_bz2
* @support.requires_gzip
* @support.requires_lzma
* @support.requires_zlib
For example, "@requires_zlib" becomes "@requires_zlib()".
The GNU docs describe the `devmajor` and `devminor` fields of the tar
header struct only in the context of character and block special files,
suggesting that in other cases they are not populated. Typical utilities
behave accordingly; this patch teaches `tarfile` to do the same.
Make it easier to run and test Python on systems with restrict crypto policies:
* add requires_hashdigest to test.support to check if a hash digest algorithm is available and working
* avoid MD5 in test_hmac
* replace MD5 with SHA256 in test_tarfile
* mark network tests that require MD5 for MD5-based digest auth or CRAM-MD5
https://bugs.python.org/issue38270