0
0
mirror of https://github.com/python/cpython.git synced 2024-11-28 16:45:42 +01:00
Commit Graph

22 Commits

Author SHA1 Message Date
Guido van Rossum
f326134e5c Map .shtml to text/html. 1997-10-07 14:56:42 +00:00
Guido van Rossum
d57548023f A variant on webchecker that creates a mirror copy of a remote site. 1997-10-06 18:54:25 +00:00
Guido van Rossum
2237b73baf Several changes:
- Change the code that looks for robots.txt to always look in /, even
if the "root" path is somewhere deep down below.

- Add link processing in <AREA> tags.

- Change safeclose() to avoid crashing when the file has no geturl()
method.
1997-10-06 18:54:01 +00:00
Guido van Rossum
68bdad1015 Tiny script to play with it on a Mac. 1997-05-28 16:09:02 +00:00
Guido van Rossum
29f6533c7f Scroll to top of info window when done. 1997-05-09 03:19:29 +00:00
Guido van Rossum
89efda363f Avoid the fancy handler for error 401 (request authentication). 1997-05-07 15:00:56 +00:00
Guido van Rossum
af310c1d00 Restructured Checker class to get rid of 'ext' table.
Links are now either in 'todo' or 'done', and ext links
are hadled more like local links except that no further
links are gathered (and sometimes they aren't checked,
e.g. for mailto and news URLs).  The -x option reverses
its meaning: it disables checking of ext links (they are
moved to 'done' without checking).  A new 'errors' table
collects pages with bad links as we go -- redundant,
but useful for the GUI version which needs to report
this as we go.  Some new methods, including reset().
New checkpoint format.

Adapted the GUI to the changes in the Checker class.
Added Quit and "Start over" buttons, and a checkbox
to disable checking external links.  The details
window now also shows bad links emanating from the
selected page.  Miscellaneous small chages.
1997-02-02 23:30:32 +00:00
Guido van Rossum
4f6ecdaacf Add root URL entry box, separate start/stop/step buttons.
If the users selects an item in 'To check', start checking there.
1997-02-01 05:17:29 +00:00
Guido van Rossum
6133ec656e Process <img> and <frame> tags. Don't bother skipping second href. 1997-02-01 05:16:08 +00:00
Guido van Rossum
de99d310cc Check in another copy of tktools.py... 1997-01-31 18:58:53 +00:00
Guido van Rossum
06981c328d Tk interface to webchecker. Not fully featured yet, but usable. 1997-01-31 18:58:12 +00:00
Guido van Rossum
0b0b5f0279 Spin off checking of external page in a subroutine.
Increase MAXPAGE to 150K.
Add back printing of __doc__ for usage message.
1997-01-31 18:57:23 +00:00
Guido van Rossum
e5605ba3c2 Many misc changes.
- Faster HTML parser derivede from SGMLparser (Fred Gansevles).

- All manipulations of todo, done, ext, bad are done via methods, so a
derived class can override.  Also moved the 'done' marking to
dopage(), so run() is much simpler.

- Added a method status() which returns a string containing the
summary counts; added a "total" count.

- Drop the guessing of the file type before opening the document -- we
still need to check those links for validity!

- Added a subroutine to close a connection which first slurps up the
remaining data when it's an ftp URL -- apparently closing an ftp
connection without reading till the end makes it hang.

- Added -n option to skip running (only useful with -R).

- The Checker object now has an instance variable which is set to 1
when it is changed.  This is not pickled.
1997-01-31 14:43:15 +00:00
Guido van Rossum
c59a5d449f Set proper User-agent header (Python-webchecker/<version>).
When -x is combined with -q, still do the checking, but don't print
the error in this phase -- they are reported by report_errors().
1997-01-30 06:04:00 +00:00
Guido van Rossum
2739cd74b3 Some refinements of the external-link checking code: insert the errors
in the 'bad' dictionary (sanitize them so they are picklable; the
sanitation code is now a subroutine); don't check mailto: URLs; omit
colon in Error message.
1997-01-30 04:26:57 +00:00
Guido van Rossum
de66268588 Added -x option to check external links. Slooooow! 1997-01-30 03:58:21 +00:00
Guido van Rossum
325a64f207 Catch I/O errors when parsing robots.txt file.
Add version number, printed at startup in non-quited mode.
1997-01-30 03:30:20 +00:00
Guido van Rossum
df47bafa1c Basic README file 1997-01-30 03:24:00 +00:00
Guido van Rossum
3edbb35023 Added robots.txt support, using Skip Montanaro's parser.
Fixed occasional inclusion of unpicklable objects (Message in errors).
Changed indent of a few messages.
1997-01-30 03:19:41 +00:00
Guido van Rossum
bbf8c2fafd Skip Montanaro's robots.txt parser. 1997-01-30 03:18:23 +00:00
Guido van Rossum
272b37d686 web tree checker 1997-01-30 02:44:48 +00:00
Guido van Rossum
d7e4705d8f mime types guesser 1997-01-30 02:44:20 +00:00