essuu at ourshack.com
Wed Mar 8 11:57:37 GMT 2006
On Wed, 8 Mar 2006, Aaron Crane wrote:
> We looked at Awstats, to the extent of actually running it for a while.
> Then we stopped; we'd found plenty of reasons to avoid it:
> - The known vulnerabilities in its CGI mode may have been fixed, but
> spaghetti code like that is just too hard and/or unpleasant to
> audit. I can't even say with confidence that letting Awstats parse
> your log files off-line is definitely safe.
Agreed. Every time I look at the code I want to scream. It's just crying
out to be refactored into decent modules.
> - It can't actually parse Apache logs. Since 1.3.25, Apache has used
> a backslash escaping scheme for things like user-agents and
> referrers, so that you can actually parse log lines where the client
> sent a double-quote in one of those. Awstats doesn't care about
> that, so it misparses those lines.
I've not foud this to be a problem with 6.4 but perhaps I'm not looking in
the right place. Which version did you try ?
> - You can't just point it at a batch of log files; instead, you have
> to configure it to know where you store your log files, and the
> pattern used for the filenames. That means you can't prime it with
> the last month's (or year's) worth of logs -- you just have to run
> it for a month before it can give you any real history.
A simple shell script allows you iterate over as many logs as you want. We
rotate logs weekly and have had to rerun a whole year's worth before now.
Wildcards would be nice though.
> - It really really wants each vhost analysed to have exactly one log
> file. In each time period, we have one log file per public-facing
> server, each containing results for several vhosts. It wants us to
> split log files up by vhost, but then merge then by public-facing
> server, before we even have it look at them.
Kinda. You do need to merge the logs into timestamp order but you can lok
for specific vhosts with the %v modifier in the log format.
> - It doesn't seem particularly fast. Admittedly, we generate about 4
> GiB of uncompressed logs in a day, but our home-grown stuff (which
> does actually parse, you know, Apache log files) seems rather faster
> at the basic work of parsing logs, throwing away robotic traffic,
> and aggregating data from the rest.
It's not very fast and admits as much but it's fast enough on our logs
that are about 450Mb/week.
> It's possible it's not as bad for other people. In particular, to
> handle the vhost/server issue, we were effectively making Awstats
> run through our logs once per vhost. But I became convinced that
> the time complexity of Awstats is supra-linear in the number of
> requests anyway. As it gathered more data over the course of a
> month, it became apparent that it was soon going to need more CPU
> time than we had available. That's when we turned it off.
> In general, Awstats seems to be a tool that's intended for relatively
> small sites, hosted by low-end providers, with limited or no shell
> access, and exactly one log file per customer. If you don't fall into
> that category, I don't think Awstats is going to be particularly
I would agree with that. It's definitely not up to the job of managing
> > Is this really the best option, or can anyone suggest an alternative
> > which can parse Apache logfiles and successfully separate out robots
> > and spiders (about 80-90% of our hits) from real users?
> We wrote our own, sad to say. We use the ABCE robot list; I've looked
> at CPANning our code, but most of it's the data file, and I think ABCE
> own the copyright on the list.
We're tending towards doing this too. I just looked at webtrends and it's
almost $10,000 for the licence we need.
> Note also that having home-grown log analysis stuff does mean that we
> can do things that a general-purpose tool couldn't. For example, our
> software can examine popularity of site sections, rather than just of
This is the problem we're now experiencing with awstats. We need
granularity that awstats doesn't have.
"You've really gotta know where your towel is."
More information about the london.pm