0
0
mirror of https://github.com/python/cpython.git synced 2024-11-29 00:56:12 +01:00
Commit Graph

10 Commits

Author SHA1 Message Date
Guido van Rossum
e5605ba3c2 Many misc changes.
- Faster HTML parser derivede from SGMLparser (Fred Gansevles).

- All manipulations of todo, done, ext, bad are done via methods, so a
derived class can override.  Also moved the 'done' marking to
dopage(), so run() is much simpler.

- Added a method status() which returns a string containing the
summary counts; added a "total" count.

- Drop the guessing of the file type before opening the document -- we
still need to check those links for validity!

- Added a subroutine to close a connection which first slurps up the
remaining data when it's an ftp URL -- apparently closing an ftp
connection without reading till the end makes it hang.

- Added -n option to skip running (only useful with -R).

- The Checker object now has an instance variable which is set to 1
when it is changed.  This is not pickled.
1997-01-31 14:43:15 +00:00
Guido van Rossum
c59a5d449f Set proper User-agent header (Python-webchecker/<version>).
When -x is combined with -q, still do the checking, but don't print
the error in this phase -- they are reported by report_errors().
1997-01-30 06:04:00 +00:00
Guido van Rossum
2739cd74b3 Some refinements of the external-link checking code: insert the errors
in the 'bad' dictionary (sanitize them so they are picklable; the
sanitation code is now a subroutine); don't check mailto: URLs; omit
colon in Error message.
1997-01-30 04:26:57 +00:00
Guido van Rossum
de66268588 Added -x option to check external links. Slooooow! 1997-01-30 03:58:21 +00:00
Guido van Rossum
325a64f207 Catch I/O errors when parsing robots.txt file.
Add version number, printed at startup in non-quited mode.
1997-01-30 03:30:20 +00:00
Guido van Rossum
df47bafa1c Basic README file 1997-01-30 03:24:00 +00:00
Guido van Rossum
3edbb35023 Added robots.txt support, using Skip Montanaro's parser.
Fixed occasional inclusion of unpicklable objects (Message in errors).
Changed indent of a few messages.
1997-01-30 03:19:41 +00:00
Guido van Rossum
bbf8c2fafd Skip Montanaro's robots.txt parser. 1997-01-30 03:18:23 +00:00
Guido van Rossum
272b37d686 web tree checker 1997-01-30 02:44:48 +00:00
Guido van Rossum
d7e4705d8f mime types guesser 1997-01-30 02:44:20 +00:00