|
Web sites are unique in that they can not only provide
more information about your product than radio, television,
or print ads, but they can provide some information
to you about the number of people who visit your site.
Web designers are often asked exactly what statistics
they can provide about a web site. The response usually
includes cryptic phrases including such techno-speak
as “user sessions” and “number of
hits.” Although we would like to provide simple,
intuitive measures of web site success or failure, we
are severely hampered by limitations inherent to the
Internet. This brief explanation should give you a better
idea of what can be gathered about visitors to your
web site and help you decipher the pages of statistics
we can provide. In order to outline our limitations,
we must explain how these statistics are gathered.
Web pages are hosted by web servers. Web servers are
special computers that are tailored to do one thing
as well as possible… hand out the pages of your
web site to anyone who asks to view them. For every
file they hand out (serve), they record a single line
in a log file that details all the information it knows
about the person who asked for that file. “What
constitutes a file?” you might ask. When a user
requests a web page, they do so by clicking on a link,
or a bookmark stored in their computer, or by typing
the address of your site in the address box at the top
of a web browser. The web page file they request consists
of simple text without pictures, but that simple page
contains links to other graphics that make up your web
site. One web page often includes several dozen small
graphics files. Your web browser assembles the text
portion of the web page with the individual graphics
resulting in a complete ‘page.’ The server
will record an entry in the log file for each of these
files whether they are text or graphics. Each entry
in the log file is a ‘hit.’ For example,
let us assume I make a web page called ‘test.html.’
This web page has one background image and three pictures.
When someone requests to view test.html, they will generate
five entries in the log file (one for the text file,
one for the background and one for each of the three
pictures) and thus five ‘hits’.
It is important to understand what information is recorded
in the log file. The server is able to record the following:
- the file that was requested
- the address of the computer system to receive the
file
- the time of the request
- if the request was successful
- the browser the user was using
At this point this is all we can log about visitors
to your web site. We must use this data and make some
small assumptions to try to determine vital information
about your web site. The log files are simply tens and
hundreds of thousands of lines of the above information
stored as simple text files. It is impossible for humans
to glean useful information by looking through the files
manually; we must use statistical software to chew through
these files and attempt to interpret the information.
|
There are a variety of programs available to create
statistics from log files. One of the more popular packages
is called WebTrends, which has been rated many times
as the best statistical generator available on the market.
Other packages include Sawmill, Analog, and 123 Log
Analyzer.
One of the most important pieces of information is
how much traffic your website is receiving. One measure
of traffic is user sessions. A user session is defined
as a series of requests from a single computer over
a period of time. Since the log files record the address
of the computer to receive each file from your server,
we make an assumption that if the same computer requests
several different files within a short period of time
it is only one person requesting the files. That time
period is often set at 30 minutes. Therefore, if my
computer requests 100 files from a server in Santa Rosa,
I will generate 100 hits and only one user session.
Making this assumption causes the first problem with
log files. Many users, and in the case of AOL millions
of users, access the Internet through multiple computers
in one location. For example, all AOL users in the US
call in from their computer at home to AOL and travel
through an AOL network to a computer building located
in Virginia. The signal then passes through several
hundred computers each connected to the internet. This
causes two problems. First, since all traffic is funneled
through Virginia, you will see a tremendous amount of
traffic all from that state. The traffic by state statistics
becomes practically worthless. The second problem is
caused by the use of several computers connected to
the internet. If we review the above example, if my
computer here in Healdsburg uses service from AOL, my
request for the 100 files from a server in Santa Rosa
will first travel to Virginia then each request for
each file could be given to a separate computer to fetch
from the internet. As a result, it is theoretically
possible for my computer to generate 100 hits and 100
user sessions since our log files are recording each
request as coming from different computers.
You may now be questioning if we are left with any
worthwhile data left to study. The answer is that we
have information that is comparatively valuable, but
we simply cannot provide absolute numbers about any
statistic from log files. Hopefully in the near future,
new servers will record better information. The down
side is that people enjoy the relative anonymity of
the internet and may resist visiting sites that collect
detailed information about them without their consent.
To actively monitor traffic on your site, server log
files remain one of the important tools we use, however,
we have other tools that can help us complete the picture.
JavaScript based page trackers also have their place
when we want to know more precisely how many people
are visiting the site.
|