awstats
Simon Wilcox
essuu at ourshack.com
Wed Mar 8 11:57:37 GMT 2006
On Wed, 8 Mar 2006, Aaron Crane wrote:
> We looked at Awstats, to the extent of actually running it for a while.
> Then we stopped; we'd found plenty of reasons to avoid it:
>
> - The known vulnerabilities in its CGI mode may have been fixed, but
> spaghetti code like that is just too hard and/or unpleasant to
> audit. I can't even say with confidence that letting Awstats parse
> your log files off-line is definitely safe.
Agreed. Every time I look at the code I want to scream. It's just crying
out to be refactored into decent modules.
> - It can't actually parse Apache logs. Since 1.3.25, Apache has used
> a backslash escaping scheme for things like user-agents and
> referrers, so that you can actually parse log lines where the client
> sent a double-quote in one of those. Awstats doesn't care about
> that, so it misparses those lines.
I've not foud this to be a problem with 6.4 but perhaps I'm not looking in
the right place. Which version did you try ?
> - You can't just point it at a batch of log files; instead, you have
> to configure it to know where you store your log files, and the
> pattern used for the filenames. That means you can't prime it with
> the last month's (or year's) worth of logs -- you just have to run
> it for a month before it can give you any real history.
A simple shell script allows you iterate over as many logs as you want. We
rotate logs weekly and have had to rerun a whole year's worth before now.
Wildcards would be nice though.
> - It really really wants each vhost analysed to have exactly one log
> file. In each time period, we have one log file per public-facing
> server, each containing results for several vhosts. It wants us to
> split log files up by vhost, but then merge then by public-facing
> server, before we even have it look at them.
Kinda. You do need to merge the logs into timestamp order but you can lok
for specific vhosts with the %v modifier in the log format.
> - It doesn't seem particularly fast. Admittedly, we generate about 4
> GiB of uncompressed logs in a day, but our home-grown stuff (which
> does actually parse, you know, Apache log files) seems rather faster
> at the basic work of parsing logs, throwing away robotic traffic,
> and aggregating data from the rest.
It's not very fast and admits as much but it's fast enough on our logs
that are about 450Mb/week.
> It's possible it's not as bad for other people. In particular, to
> handle the vhost/server issue, we were effectively making Awstats
> run through our logs once per vhost. But I became convinced that
> the time complexity of Awstats is supra-linear in the number of
> requests anyway. As it gathered more data over the course of a
> month, it became apparent that it was soon going to need more CPU
> time than we had available. That's when we turned it off.
>
> In general, Awstats seems to be a tool that's intended for relatively
> small sites, hosted by low-end providers, with limited or no shell
> access, and exactly one log file per customer. If you don't fall into
> that category, I don't think Awstats is going to be particularly
> convenient.
I would agree with that. It's definitely not up to the job of managing
large sites.
> > Is this really the best option, or can anyone suggest an alternative
> > which can parse Apache logfiles and successfully separate out robots
> > and spiders (about 80-90% of our hits) from real users?
>
> We wrote our own, sad to say. We use the ABCE robot list; I've looked
> at CPANning our code, but most of it's the data file, and I think ABCE
> own the copyright on the list.
We're tending towards doing this too. I just looked at webtrends and it's
almost $10,000 for the licence we need.
> Note also that having home-grown log analysis stuff does mean that we
> can do things that a general-purpose tool couldn't. For example, our
> software can examine popularity of site sections, rather than just of
> URLs.
This is the problem we're now experiencing with awstats. We need
granularity that awstats doesn't have.
Simon.
--
"You've really gotta know where your towel is."
More information about the london.pm
mailing list