Resume
Weblog
Services
Portfolio
Contact
Xtras
Top
Back to homepageView sitemap

Sunday, August 6th, 2006

Global Access Tracker

It’s been a while since my last post and I promised to write more often. I decided that it was about time that I posted an update on my current projects.

Frequent readers of my blog might remember the Maps project, which is on hold for a while. It’s a very demanding project, particularly since Google keeps adding features to the API which I obviously want to try out, such as the geocoding and zoom. I will most probably work on this project again a few weeks before my trip to NY (more about this on a future post), to start planning the best places to visit.

With the maps project paused, I decided to revisit a personal project that involves the analysis of my site’s visitors. This project, codenamed GATracker (GA meaning Global Access) is fairly old, I started it on February 2003 and have only made minor updates since then. Back then my site was completely hosted on mac.com, and I didn’t have access to HTTP Server logs, so I created a perl script (hosted on another site) that was called each time a page was loaded, sort of a counter. The script recorded log style information into a flat file, which I periodically downloaded and inserted into a MySQL database for some mining. Now that my site is hosted at gbtopia.com I have a servlet that logs every visit into a MySQL database directly.

The GATracker project consists of a bunch of JSPs and Custom Tags that extract information from the MySQL database and generate different reports: Sections visited, Browser and Operating System, Country, Referers, Date Analysis and Search Engines. I’ve found that several people are amazed by all the information that I’m able to retrieve with these simple tracking tactics, which shows that most people aren’t aware of all the information flowing through the internet when they visit a site. This tracker currently runs on Tomcat 5.5 since unfortunately there’s no WebSphere Application Server for Mac OS X.

I am working on the following improvements:

  • Ability to import Apache access logs, which will let me obtain more accurate statistics, but limits the amount of information that I can gather from each visitor, so I’m keeping the counter and using this simply as an additional source for data.
  • Performance improvements. The current code is quite old and has some scalability issues, which I’m fixing by introducing parallel processing. I will use Message Driven Beans (MDBs) to enable parallel analysis of logs, which should help reduce the time taken to analyze large logs.
  • Since Tomcat is just a Servlet Container and I’m now using MDBs, I switched to Geronimo.
  • Beautify reports. The current ones are just plain ugly: all function, practically no style.
  • Make it universal. I’m considering making this a project available for download, but this requires some documentation and minor alterations to support any website.
  • I’m also evaluating Derby (previously Cloudscape) to replace MySQL as a database, but this move will largely depend on performance.
Tags: blog
bottom border