ECN No Name Newsletter: May, 1995

The ECN No Name Newsletter is no longer being published. This is an archived issue.

[previous article] [next article]

Pagecount: Access Statistics For WWW Documents

NO NAME NEWSLETTER-- May 1995

David A. Curry


Over the past few months, the ECN has received numerous requests from the user community to provide the ability to determine how many times a specific World-Wide Web document (usually a home page) has been accessed. In response to these requests, we have developed the pagecount program.

Each time a properly-configured document is accessed via a World-Wide Web browser, pagecount records this event in a document-specific database. This information can then be printed in either summary or detailed form. The individual statistics maintained by pagecount are:

Server-Side Includes
In order for pagecount to be able to monitor access to a document, "server-side includes" must be enabled in the directory that contains that document. In general, this is done by creating a file called .htaccess in the directory that contains the document, and placing the following lines therein:

Options FollowSymlinks Indexes Includes
AddType text/x-server-parsed-html .html

Before turning on server-side includes, you need to be aware of the performance issues involved in doing so (in general, they make document retrieval much slower), and some of the ways in which you can avoid these problems. These issues, and their solutions, are described in http://ecn.www.ecn.purdue.edu/ecn/FAQ/httpd/ssincl/

Initialization
In order to record access statistics for a document, pagecount must first be called from the shell to initialize the statistics database for that document. To do this, pagecount is called with the document file name as an argument:

% /usr/local/etc/httpd/cgi-bin/pagecount \ [options]documentname.html

The file name may be given as either an absolute or a relative path name. For example, to enable pagecount on a personal home page, either of the following methods may be used:

% /usr/local/etc/httpd/cgi-bin/pagecount \
~/public-web/Index.html

% cd ~/public-web
% /usr/local/etc/httpd/cgi-bin/pagecount \
Index.html

If the pagecount databases have already been created for a document (through a previous call to pagecount), the initialization process will destroy those databases and their contents. In this situation, pagecount will prompt for confirmation before proceeding.

When initializing the pagecount databases for a document, several options can be given to control pagecount's behavior:

-d datefmt
Change the format used by pagecount to print dates. See the manual page for details on how to specify the format.
-h hostname
Tell pagecount to use hostname instead of the server's real host name when generating URLs. This can be used to include a host name alias (e.g., www.ecn.purdue.edu) in the URL.
-m mesgfile
When pagecount is invoked from a server-side include, it prints a default message that summarizes the access statistics (see below for an example). This option allows the contents of that message to be changed. See the manual page for details on the format of this file, and the method for including the contents of the statistics database in the message.
-p progpath
Tell pagecount to use progpath as its path name, instead of the path name it was invoked with, when generating URLs. This allows a shell script (or other program) that calls pagecount to be specified to perform additional functions when invoked.
-ttimefmt
Change the format used by pagecount to print times. See the manual page for details on how to specify the format.

Invocation

To invoke pagecount from a document, use a server-side include:

<!--#exec cgi="/cgi-bin/pagecount" -->

Now, each time the document is accessed by a Web browser, the server-side include will cause pagecount to be invoked. Note that the above statement must occur somewhere in the document in order for access statistics for the document to be recorded.

When invoked, pagecount will update the statistics database, and then output a short summary of the statistics. This summary looks like:


This document, Index.html, has been accessed 20 times since 05-Apr-95 13:18:40 EST. This is the 7th time it has been accessed today.

A total of 3 different hosts have accessed this document in the last 2 days; your host, foo.bar.com, has accessed it 17 times.

If you're interested, complete statistics for this document are also available, including breakdowns by top-level domain, host name and date.


The contents of the above message can be changed by using the -m option when initializing the database.

The second method of invoking pagecount is via an HTML anchor (hypertext link). In this form, the anchor should specify the Universal Resource Identifier (URI) of the document as a query argument. For example, if the document's URI is /~bob/foobar.html, the anchor reference should look like this:

<A HREF="/cgi-bin/pagecount?/~bob/foobar.html"> click here</A>

The message file can include this anchor automatically. However, the above may be used in other documents, hotlists and document menus.

When invoked in this manner, pagecount will produce a new ``document'' (no file is created, but the browser will see it as following a hypertext link) that contains a complete printout of the statistics database for the document named in the query argument. The printout will be divided into five sections:

NOTE: pagecount does not update independent statistics programs.