Kreck Design Solutions - Web Design and Web Application in Santa Rosa, Sonoma County California
Kreck Design Solutions - Turn Your Website On
AboutOur WorkInformationContact UsPress Releases
  How server log files work

Check out our latest design:
St. Francis Winery!

Spoke Folk Cyclery
Launch site

Web sites are unique in that they can not only provide more information about your product than radio, television, or print ads, but they can provide some information to you about the number of people who visit your site.

Web designers are often asked exactly what statistics they can provide about a web site. The response usually includes cryptic phrases including such techno-speak as “user sessions” and “number of hits.” Although we would like to provide simple, intuitive measures of web site success or failure, we are severely hampered by limitations inherent to the Internet. This brief explanation should give you a better idea of what can be gathered about visitors to your web site and help you decipher the pages of statistics we can provide. In order to outline our limitations, we must explain how these statistics are gathered.

Web pages are hosted by web servers. Web servers are special computers that are tailored to do one thing as well as possible… hand out the pages of your web site to anyone who asks to view them. For every file they hand out (serve), they record a single line in a log file that details all the information it knows about the person who asked for that file. “What constitutes a file?” you might ask. When a user requests a web page, they do so by clicking on a link, or a bookmark stored in their computer, or by typing the address of your site in the address box at the top of a web browser. The web page file they request consists of simple text without pictures, but that simple page contains links to other graphics that make up your web site. One web page often includes several dozen small graphics files. Your web browser assembles the text portion of the web page with the individual graphics resulting in a complete ‘page.’ The server will record an entry in the log file for each of these files whether they are text or graphics. Each entry in the log file is a ‘hit.’ For example, let us assume I make a web page called ‘test.html.’ This web page has one background image and three pictures. When someone requests to view test.html, they will generate five entries in the log file (one for the text file, one for the background and one for each of the three pictures) and thus five ‘hits’.

It is important to understand what information is recorded in the log file. The server is able to record the following:

  • the file that was requested
  • the address of the computer system to receive the file
  • the time of the request
  • if the request was successful
  • the browser the user was using

At this point this is all we can log about visitors to your web site. We must use this data and make some small assumptions to try to determine vital information about your web site. The log files are simply tens and hundreds of thousands of lines of the above information stored as simple text files. It is impossible for humans to glean useful information by looking through the files manually; we must use statistical software to chew through these files and attempt to interpret the information.

There are a variety of programs available to create statistics from log files. One of the more popular packages is called WebTrends, which has been rated many times as the best statistical generator available on the market. Other packages include Sawmill, Analog, and 123 Log Analyzer.

One of the most important pieces of information is how much traffic your website is receiving. One measure of traffic is user sessions. A user session is defined as a series of requests from a single computer over a period of time. Since the log files record the address of the computer to receive each file from your server, we make an assumption that if the same computer requests several different files within a short period of time it is only one person requesting the files. That time period is often set at 30 minutes. Therefore, if my computer requests 100 files from a server in Santa Rosa, I will generate 100 hits and only one user session.

Making this assumption causes the first problem with log files. Many users, and in the case of AOL millions of users, access the Internet through multiple computers in one location. For example, all AOL users in the US call in from their computer at home to AOL and travel through an AOL network to a computer building located in Virginia. The signal then passes through several hundred computers each connected to the internet. This causes two problems. First, since all traffic is funneled through Virginia, you will see a tremendous amount of traffic all from that state. The traffic by state statistics becomes practically worthless. The second problem is caused by the use of several computers connected to the internet. If we review the above example, if my computer here in Healdsburg uses service from AOL, my request for the 100 files from a server in Santa Rosa will first travel to Virginia then each request for each file could be given to a separate computer to fetch from the internet. As a result, it is theoretically possible for my computer to generate 100 hits and 100 user sessions since our log files are recording each request as coming from different computers.

You may now be questioning if we are left with any worthwhile data left to study. The answer is that we have information that is comparatively valuable, but we simply cannot provide absolute numbers about any statistic from log files. Hopefully in the near future, new servers will record better information. The down side is that people enjoy the relative anonymity of the internet and may resist visiting sites that collect detailed information about them without their consent.

To actively monitor traffic on your site, server log files remain one of the important tools we use, however, we have other tools that can help us complete the picture. JavaScript based page trackers also have their place when we want to know more precisely how many people are visiting the site.