Brown trousers time :~
djk at tobit.co.uk
Mon Oct 8 12:58:36 BST 2007
Lyle - CosmicPerl.com wrote:
> Hi All,
> Soon I'll be embarking on the largest project I've ever undertaken. It's
> going to be 10's of thousands of lines of code. It needs to be perfect
> (or damn close :))
Best of luck. You'll need it. If you can't write / test / debug high
kwalitee perl code quickly yourself then hire someone who can (or can
teach you to do it). You will save 1000's and a lot of heartache.
> Part's of the software need to be able to withstand large volumes of
> traffic... I'm talking about 100's, 1000's or even 10,000's of clicks
> per second!
You need to understand, intimately, how to speed up webserving perl (or
other scripting languages du jour). Personally: I would avoid using
mod_perl (or even apache) like the plague (except, possibly, as front
end cache machines). I much prefer any of the small threaded/select
based webservers (eg lighttpd, litespeed, thttpd etc) with a FastCGI
The mod_perl site does have a number of extremely important articles
about how to go about planning something like this. Don't do *anything*
until you have read and *understood* what is there.
You will find that one of the tradeoffs you will need to make is RAM v
webserver processes/threads. The articles above explain that. Once you
have understood that thoroughly, then go back and look at something like
lighttpd/litespeed/thttpd + FastCGI and you will (at least) understand
where I am coming from (even if you don't end up agreeing).
> This all has me thoroughly bricking it :~
Welcome to the club :-)
> From what I've learned it'll have to be mod_perl handling the heavily
> traffic parts of the software. Basically CGI scripts that open a
> database connection, read data, then write data and redirect the browser.
One of things that you really, really should try to achieve is to make
as much static as possible. Even if that means using (and reusing) acres
of disc space - just for html cache. Most so called "dynamic" sites
aren't at all. Take a shopping site, the only things that change on a
product page are the price (and possibly things like stock levels). But
these don't change that often. You can generate the page, on demand, and
then cache it, you have a system that invalidates the page when
something on it changes. It is rather web 1.0 but it works and is as
quick as you can serve that html page.
Remember that the overriding usage on even the most "interactive"
website is GET not PUT (or GET equiv).
> From all my searching I have a few questions yet unanswered, I'm hoping
> you guys can help...
> I'm concerned that I'll have to quickly write some C libraries for the
> heavy traffic parts, the book I've found referenced most is "Embedding
> and Extending Perl", is this the best book to get? Or do you guys
> recommend others? Or do you recommend other books to get along with this
Don't do this. Here be many dragons. I doubt that there is anything
perlish that will require this. However you may find yourself (as I did)
writing plugins for your webserver du jour to manage the cache
invalidation stuff. But you do that once your understand what it is that
you are trying to do and have a working webserving cloud to do it with.
> What's the mod_perl equivalent in Win32? I'm guessing PerlScript in ASP,
> but is that faster? I can't find any benchmarks.
Windows? Do you really want to do this in perl? (sorry to be
controversial). You would be much better off taking the M$ 10 cents and
doing it whatever is the "M$ way" this week.
> Would it be best to have separate databases (all in MySQL) for different
> parts of the program? So that the database tables that are heavily
> accessed are totally separate from those that aren't.
Design it first so it does not matter. Benchmark it. Then decide.
> Anybody got some spare underpants? (preferably not white ones)
I recommend Marks & Spencer myself.
> I want everything to be as realtime as possible. But this would mean
> updating several tables for each of those hits, I get the nasty feeling
> that will be too slow. So would it probably be better to have a cron job
> updating some tables, every 10 minutes or so, and keep the heavily
> updating to a single table?
Realtime+many requests/sec = Big Bucks for Big Iron. Avoid, you almost
certainly don't need Real realtime.
Prepare yourself for a sore head, once with the learning and again with
the banging on the office wall.
More information about the london.pm