Hardware Reliability
Simon Wilcox
essuu at ourshack.com
Mon Jun 8 11:17:29 BST 2009
On 8/6/09 10:40, duncan.garland at ntlworld.com wrote:
> ---- Raphael Mankin <raph at mankin.org.uk> wrote:
>> On Sun, 2009-06-07 at 12:13 +0100, Duncan Garland wrote:
>>
>>> I wonder if the problem can be approached from the other end. I wonder if
>>> there is a design standard (ISO or such like) which states that a
>>> manufacturer should aim for an MTBF of whatever.
>>>
>>> I'll let you know if I find anything.
>> MTBF, when quoted, is largely meaningless. The figures are computed,
>> purely theoretical. No-one actually runs a sufficiently large number of
>> items for long enough to get meaningful statistics. If they did, they
>> would miss the market.
>>
>> Imagine having to run, say, 10000 disk drives for five years in order to
>> get meaningful MTBFs before you could put them on sale.
>>
>> Only people like Google, Microsoft or Yahoo actually have sufficient
>> data, and all they can tell you that is *useful* is that some
>> manufacturers are, in the long term, better than others. Nothing about
>> models that are not obsolete.
> Calculated MTBF figures are not meaningless because they show what
the manufacturer expected. The manufacturers base their warranty
programmes and even whether or nor to go into production on them, Do you
know where I can get some?
They're mostly meaningless though as we know that some drives fail
within days of installation so some drives must last years past the date
the MTBF might suggest.
Also, hardware itself is rarely the only factor these days, software
faults in firmware are just as likely to cause downtime (in my
experience) and as far as I know that's not allowed for in any MTBF
calculations.
My rule of thumb is that most kit installed in a datacentre will last 3
years if it lasts a week but once you start seeing disk errors you
should plan to replace them. In my experience, just replacing kit
because it's a certain age usually ends up with more problems, not less,
if the kit being replaced is fault free, as a percentage of new kit will
die in the first week of operation.
If your kit isn't in a datacentre (you didn't say how much or what sort
of location you're interested in) then you're more likely to see fan
faults or motherboard issues from sucking in half a pound of dead skin
cells than you are a hard drive failing.
S.
More information about the london.pm
mailing list