Telling what pages are pointing to your server

Do you wonder how people keeps coming into your webpages, even if you made no effort to get it? Is your server not giving you information about it? How to serve such information? Well, it is easy...

Experience 1: recording referrers to our pages

We made (very reciently) a CGI miniscript to record the refered page when someone reach our node

#!/bin/sh
######################################################################
# file: GET para registrar logs de lugars que nos CITAN 
# Date: noviembre 1996.  Author: Alejandro Rivero 
######################################################################
#
# We use this script to log access to html documents, so 
# the adequate Type is:
echo 'Content-Type: text/html'
echo ''
#
# Our document tree starts at ./htdocs, so we look there and
# we send the document to the client
cat /home/http/htdocs/$SCRIPT_NAME
# Now we check for referers from external pages (our server being "dftuz")
# and store them in a local file for further analysis 
if (!(`echo $HTTP_REFERER | grep -q dftuz`)) then 
echo  $HTTP_REFERER $REMOTE_HOST $SCRIPT_NAME >> /home/http/logs/cites 
fi
# and that was all ! 
exit 0
Then, to activate it we simply note it in the httpd.config file of our nice CERN httpd server:
 
Exec    /*html                  /home/http/cgi-bin/logfrom.sh
So any request for a .html file is redirected to the script. Access to gifs, etc is not logged.

Experience 2: Dinamically serving backlink data

What to do with the REFERER information?

Well, we have thought about storing the acumulated link information for each file and dinamically add it to the HEAD part, with a tag:

LINK HREF="$HTTP_REFERER" REV="untyped"
(Where "untyped" honors Tim Lee' 1990 design .)

Regretly, I'm not aware of any decent REL/REV value supported by current browsers. A pity, as they could provide a nice menu to go "Up".

Anyway, we have implemented the idea with another CGI-Script, a bit more complex, which stores independent REFERER files for each html file, and mix them with the answer to any http request. This is called a "backlink" mechanism or so.

Take a look to

  • the backlink script, jointly with a suggested exclusion list, or to
  • the paper submited to WWW6

    Note that it works with CERN httpd 3.0, and that if you are going to test it, you need to provide a "host excluded" file with at least one line.

    Experience 3: Sending only the head of our HTML documents

    Now that the head part of our html documents is becoming rich (TITLE, METAs, LINKs...) we must think some nicetie to send only this part to index machines and robots not really interested in content.

    Our current idea is to send s new MIME type, text/htmlhead, but we must think slowly... We will return over after Xmas. In the meantime, if you have some suggestion, don't hesitate to write


    Please note that those scripts are experimental and not fool-proofed. Use them at your own risk.

    rivero@sol.unizar.es
    December 1996