urlchange, an non-interactive approach

Marcus Geiger

26th July, 2000

Name

urlchange - check if a list of URLs has changed.

Synopsis

urlchange [ -eqdtscvh ] [ -b DB-File ] [ -f URL-File ] [ -a ADDRESS ]
          [ -l LANG ] [ -n HOST ]

Description

urlchange is a small script written in the Python programming language, that allows you to check if a list of URLs has changed. Although the name suggests generic URL change-detection, currently only the HTTP protocol is supported.

urlchange reads a file that contains one fully qualified URL per line and tests if the URL has changed.

A main design goal of urlcheck is, that it allows you to do this in a non-interactive manner; this means that it can notify anyone by email if an error or change has been detected.

urlchange first tries to get a `Last-Modified' HTTP header by doing a HEAD request. If the remote server does not reply with such a HEADER, urlchange tries to fetch the destination pointed by the URL, to determine the length of the content. The idea behind this scheme is, if a document has changed, most often its size also changes.

urlchange ends up for a given URL with one of the following three states: modification-date, document-length or error-state. urlchange collects this information and compares it against data collected in a previous run (except for errors). To allow this comparison urlchange stores the attributes modification-date or document-length on an URL basis in an DBM file. To do this it uses any DBM like database installed on the target system that is supported by the Python builtin library module ANYDBM.

When running in interactive mode, urlchange can be told to print verbose output that is readable by an user. But most often you may want to check if a fixed set of sites has changed (maybe every time a dialup link comes up). To support this batch processing, a customizable user will be notified by email.

Command Line Options

urlchange's command options are as follows:

-b
Specify the DBM database to use. The database will be used as a reference for comparisons. If a difference between an existing entry and the actual determined property is detected, the DB entry will be updated to the actual value.

Default:  /.urlchange.db

-f
Specify the file from which urlchange reads the URLs that it should check. The file may contain one fully qualified URL per line. Currently only the HTTP protocol is supported.

Default:  /.urlchanges

-e
Enable email notification. This settings defaults to off. If any errors or changed URLs had been detected, urlchange will send a mail by SMTP, to UID under which its executes. The target address can be changed by the -a option. The source email address is the user under which urlchange executes. The mail body will contain a list of URLs that changed and a list of URLs that originated an error.
-a
Specify the destination email address for change and error notifications. The user name is determined by looking up the environment variable `LOGNAME'. You should note that this options implies email notification (-e, see above).

Default: user@hostname

-q
Enable verbose email notification. With this setting, a user may receive URL change statistics, although nothing changed. This means that the body of the notification email contains only a summary of zero changed and zero errors (if none occured) URLs. It may be useful when debugging non-interactive mode from within scripts.
-l
Specify the language that the urlchange user agent accepts. This may be necessary for multi language web sites, that refuse to send a page, if no language is specified by the user agent. The language value may be any value that is allowed by HTTP1.0 for the `Accept-Language' header value.
-n
Specify the originating hostname for HTTP.
-d
Dump the database contents to stdout.
-t
Dump the database contents to stdout of all entries that use the last modification attribute.
-s
Dump the database contents to stdout of all entries that use the file size attribute.
-c
This empties the database.
-v
Enables verbose processing. If the verbose operation is enable while dumping the database, the attributes will be listed also.
-h
Display a command line summary and program info.

Return codes

If an error occurs 1, otherwise 0 will be returned.

Examples

The following example will check the GNU project and the xemacs site for a change in verbose mode (-v), verbose mail report (-q), mail notification to user foo@bar, URLs to check from stdin (-f -), fetched change info will be stored into urls.db file (-b).
 
    echo -e "http://www.gnu.org/\nhttp://www.xemacs.org/" | \ 
    urlchange.py -vq -afoo@bar -f - -b urls.db

To check if any of all your URLs from your bookmark file has changed, on could type the following (of course, this is also a very clumsy method to check if any of your bookmarks is unreachable).

    perl -ne 'print "$1\n" if m|HREF="(http://[a-zA-Z0-9_/\-.~]+)"|i' \
    ~/.netscape/bookmarks.html | \ 
    urlchange.py -vq -afred@localhost -f - -b urls.db

Limitations

Reporting Bugs

Feel free to report any bugs. If you like, fix it, and send me the patches. Drop an email to me: marcus.geiger@bigfoot.com

Author

This peace of software is brought to you by Marcus Geiger.

Copyright

Copyright © 2000, Marcus Geiger This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



Marcus Geiger 2000-07-27