May 15, 2013

Web Analytics Collection and Tracking Methods

This blog is an excerpt from the book chapter:
Web Analytics Overview. In Encyclopedia of Information Science and Technology, 3rd Edition.

There are two major methods to collect usage data: web server logging and page tagging.

Web server logging is a traditional method of usage data collection. A log file is generated by a web server to record server activities and HTTP headers in a textual format. There are various formats of log files. Most commonly logged data in the NCSA Common Log Format (http://www.w3.org/Daemon/User/Config/Logging.html) are server IP, date/time, HTTP request command, response status, and response size. The figure below shows an example of the Common Log Format implemented in Apache Web Server 2.2. Additional data, such as HTTP headers, process id, scripts, request rewrite, etc., can be logged in proprietary formats or Extended Log File Format (http://www.w3.org/TR/WD-logfile.html). Log analysis software can be used to extract and analyze log files. Popular tools are Analog (http://www.analog.cx), Deep Log Analyzer (http://www.deep-software.com), Webalizer (http://www.webalizer.org), and AWSstats (http://awstats.sourceforge.net).


The second and more recent method uses client side programs such as embedded scripts, browser add-ons and plug-ins. For example, in a typical JavaScript tracking method, a piece of JavaScript code included in a page tracks user activity and stores information in a cookie. The information is sent to a processing server (not necessarily the same server that hosts the website) using web beacons or web services. This method is commonly used by third party service providers such as Google Analytics and Open Web Analytics. For many organizations, it has become a major type of web usage data collection.

Web server logging is less invasive and does not require page modifications. Compared to the web server logging method, page tagging has a number of advantages. First, client scripts may have access to additional information about the client such as computer screen size and color depth. Second, JavaScript can track client side user actions or events such as keyboard pressing and mouse clicking. This is particularly useful in today’s context of rich internet applications (RIA). RIAs support many client side user interactions that do not communicate with the server; therefore server side logging cannot track these actions. Last but not least, data management and reporting become simpler as many of these services are provided through a Software-as-a-Service (SaaS) model without local maintenance. This is a preferred method for small and medium websites.

A third method of data collection, application level logging, is on the rise lately. Application level logging is tightly coupled with an application, which is a functional feature of the application itself. This is an expansion of the traditional web analytics which focuses on generic HTTP requests and user actions. An application can be a shopping site, a web portal, a blog service, a learning management system, a forum, or a social networking service. Each of these applications has its own unique usage data that is collected beyond generic web requests or user actions. The usage data is processed by the application itself or by a functional module tightly coupled with the application, but not by independent logging or analytics services. For example, SharePoint 2010 provides framework specific analytics data, like usage of templates and web parts.

No comments: