Brown trousers time :~

Dirk Koopman djk at tobit.co.uk
Mon Oct 8 12:58:36 BST 2007


Lyle - CosmicPerl.com wrote:
> Hi All,
> Soon I'll be embarking on the largest project I've ever undertaken. It's 
> going to be 10's of thousands of lines of code. It needs to be perfect 
> (or damn close :))

Best of luck. You'll need it. If you can't write / test / debug high 
kwalitee perl code quickly yourself then hire someone who can (or can 
teach you to do it). You will save 1000's and a lot of heartache.

> 
> Part's of the software need to be able to withstand large volumes of 
> traffic... I'm talking about 100's, 1000's or even 10,000's of clicks 
> per second!

You need to understand, intimately, how to speed up webserving perl (or 
other scripting languages du jour). Personally: I would avoid using 
mod_perl (or even apache) like the plague (except, possibly, as front 
end cache machines). I much prefer any of the small threaded/select 
based webservers (eg lighttpd, litespeed, thttpd etc) with a FastCGI 
back end.

The mod_perl site does have a number of extremely important articles 
about how to go about planning something like this. Don't do *anything* 
until you have read and *understood* what is there.

http://perl.apache.org/docs/tutorials/index.html

You will find that one of the tradeoffs you will need to make is RAM v 
webserver processes/threads. The articles above explain that. Once you 
have understood that thoroughly, then go back and look at something like 
lighttpd/litespeed/thttpd + FastCGI and you will (at least) understand 
where I am coming from (even if you don't end up agreeing).

> 
> This all has me thoroughly bricking it :~

Welcome to the club :-)

> 
>  From what I've learned it'll have to be mod_perl handling the heavily 
> traffic parts of the software. Basically CGI scripts that open a 
> database connection, read data, then write data and redirect the browser.
> 

One of things that you really, really should try to achieve is to make 
as much static as possible. Even if that means using (and reusing) acres 
of disc space - just for html cache. Most so called "dynamic" sites 
aren't at all. Take a shopping site, the only things that change on a 
product page are the price (and possibly things like stock levels). But 
these don't change that often. You can generate the page, on demand, and 
then cache it, you have a system that invalidates the page when 
something on it changes. It is rather web 1.0 but it works and is as 
quick as you can serve that html page.

Remember that the overriding usage on even the most "interactive" 
website is GET not PUT (or GET equiv).

>  From all my searching I have a few questions yet unanswered, I'm hoping 
> you guys can help...
> 
> I'm concerned that I'll have to quickly write some C libraries for the 
> heavy traffic parts, the book I've found referenced most is "Embedding 
> and Extending Perl", is this the best book to get? Or do you guys 
> recommend others? Or do you recommend other books to get along with this 
> one?

Don't do this. Here be many dragons. I doubt that there is anything 
perlish that will require this. However you may find yourself (as I did) 
writing plugins for your webserver du jour to manage the cache 
invalidation stuff. But you do that once your understand what it is that 
you are trying to do and have a working webserving cloud to do it with.

> 
> What's the mod_perl equivalent in Win32? I'm guessing PerlScript in ASP, 
> but is that faster? I can't find any benchmarks.

Windows? Do you really want to do this in perl? (sorry to be 
controversial). You would be much better off taking the M$ 10 cents and 
doing it whatever is the "M$ way" this week.

> 
> Would it be best to have separate databases (all in MySQL) for different 
> parts of the program? So that the database tables that are heavily 
> accessed are totally separate from those that aren't.

Design it first so it does not matter. Benchmark it. Then decide.

> 
> Anybody got some spare underpants? (preferably not white ones)
> 

I recommend Marks & Spencer myself.

> I want everything to be as realtime as possible. But this would mean 
> updating several tables for each of those hits, I get the nasty feeling 
> that will be too slow. So would it probably be better to have a cron job 
> updating some tables, every 10 minutes or so, and keep the heavily 
> updating to a single table?

Realtime+many requests/sec = Big Bucks for Big Iron. Avoid, you almost 
certainly don't need Real realtime.

Prepare yourself for a sore head, once with the learning and again with 
the banging on the office wall.

Dirk



More information about the london.pm mailing list