urlchange, an non-interactive approach
Marcus Geiger
26th July, 2000
urlchange - check if a list of URLs has changed.
urlchange [ -eqdtscvh ] [ -b DB-File ] [ -f URL-File ] [ -a ADDRESS ]
[ -l LANG ] [ -n HOST ]
urlchange is a small script written in the Python programming language,
that allows you to check if a list of URLs has changed. Although the
name suggests generic URL change-detection, currently only the HTTP
protocol is supported.
urlchange reads a file that contains one fully qualified URL per line
and tests if the URL has changed.
A main design goal of urlcheck is, that it allows you to do this in
a non-interactive manner; this means that it can notify anyone by email
if an error or change has been detected.
urlchange first tries to get a `Last-Modified' HTTP header by doing a
HEAD request. If the remote server does not reply with such a HEADER,
urlchange tries to fetch the destination pointed by the URL, to
determine the length of the content. The idea behind this scheme is,
if a document has changed, most often its size also changes.
urlchange ends up for a given URL with one of the following three
states: modification-date, document-length or
error-state. urlchange collects this information and compares
it against data collected in a previous run (except for errors). To
allow this comparison urlchange stores the attributes
modification-date or document-length on an URL basis in
an DBM file. To do this it uses any DBM like database installed on the
target system that is supported by the Python builtin library module
ANYDBM.
When running in interactive mode, urlchange can be told to print
verbose output that is readable by an user. But most often you may want
to check if a fixed set of sites has changed (maybe every time a
dialup link comes up). To support this batch processing, a
customizable user will be notified by email.
urlchange's command options are as follows:
- -b
- Specify the DBM database to use. The database will be used
as a reference for comparisons. If a difference between an existing
entry and the actual determined property is detected, the DB entry
will be updated to the actual value.
Default: /.urlchange.db
- -f
- Specify the file from which urlchange reads the URLs that it
should check. The file may contain one fully qualified URL per
line. Currently only the HTTP protocol is supported.
Default: /.urlchanges
- -e
- Enable email notification. This settings defaults to off. If
any errors or changed URLs had been detected, urlchange will send a
mail by SMTP, to UID under which its executes. The target address
can be changed by the -a option. The source email address is the
user under which urlchange executes. The mail body will contain a
list of URLs that changed and a list of URLs that originated an
error.
- -a
- Specify the destination email address for change and error
notifications. The user name is determined by looking up the
environment variable
`LOGNAME'. You should note that this options
implies email notification (-e, see above).
Default: user@hostname
- -q
- Enable verbose email notification. With this setting, a user
may receive URL change statistics, although nothing changed. This
means that the body of the notification email contains only a
summary of zero changed and zero errors (if none occured) URLs. It
may be useful when debugging non-interactive mode from within
scripts.
- -l
- Specify the language that the urlchange user agent
accepts. This may be necessary for multi language web sites, that
refuse to send a page, if no language is specified by the user
agent. The language value may be any value that is allowed by
HTTP1.0 for the `Accept-Language' header value.
- -n
- Specify the originating hostname for HTTP.
- -d
- Dump the database contents to stdout.
- -t
- Dump the database contents to stdout of all entries that use
the last modification attribute.
- -s
- Dump the database contents to stdout of all entries that use
the file size attribute.
- -c
- This empties the database.
- -v
- Enables verbose processing. If the verbose operation is
enable while dumping the database, the attributes will be listed also.
- -h
- Display a command line summary and program info.
If an error occurs 1, otherwise 0 will be returned.
The following example will check the GNU project and the xemacs site
for a change in verbose mode (-v), verbose mail report (-q), mail
notification to user foo@bar, URLs to check from stdin (-f -), fetched
change info will be stored into urls.db file (-b).
echo -e "http://www.gnu.org/\nhttp://www.xemacs.org/" | \
urlchange.py -vq -afoo@bar -f - -b urls.db
To check if any of all your URLs from your bookmark file has changed,
on could type the following (of course, this is also a very clumsy
method to check if any of your bookmarks is unreachable).
perl -ne 'print "$1\n" if m|HREF="(http://[a-zA-Z0-9_/\-.~]+)"|i' \
~/.netscape/bookmarks.html | \
urlchange.py -vq -afred@localhost -f - -b urls.db
- Currently only the HTTP protocol is supported.
- To allow email notification, the host that executes urlchange,
needs to run a SMTP server.
Feel free to report any bugs. If you like, fix it, and send me the
patches. Drop an email to me: marcus.geiger@bigfoot.com
This peace of software is brought to you by Marcus Geiger.
Copyright © 2000, Marcus Geiger
This is free software; see the source for copying conditions. There
is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Marcus Geiger
2000-07-27