[AccessD] Needs analysis

Sun Jun 20 19:13:51 CDT 2010

Jim,

 >For example, how much time does it cost you to rebuild the array when a drive drops vs restoring 
from a backup?

That's the nice thing about modern raid controllers - it takes "zero time" to rebuild the array.

When I am notified that a drive has dropped out, I simply replace the drive.  I go in and tell the 
controller that the drive is available and the controller goes in and starts rebuilding the array. 
It does so "in the background", literally allowing me to use the system while the rebuild is 
happening.  I am told it slows things down by 5-10% until the array is rebuilt, though generally the 
slowdown is not really noticeable.  The rebuild takes about an hour or so for a terabyte drive.

So from my perspective, it takes a few minutes to remove and replace the drive and tell the 
controller to rebuild and then I go back to work.

It is actually not about "zero down time", it is about zero maintenance.  I have backups but in 5 
years I have never used them and I will never use them under normal circumstances.  Really my 
backups are more about restoring a database that I screwed up rather than hardware malfunction.

I really didn't back into this either.  Up front I studied what raid is, the various types, how fast 
they are, streaming reads, streaming writes for the various raid configs etc.  I then tried 
"software raids" (motherboard controllers) and discovered that they are not happening.  I then 
studied the hardware co-processors, then did that.

My situation really is pretty different in that my databases are fairly "read-only", though 
certainly not entirely.  They are NOT transactional though.  I have to concentrate on streaming 
reads and IOPS.  I am getting my job done just fine as things stand today, but I could be more 
efficient if I could speed up the things that I have to do where I am "waiting".

For example, one of my tasks is to validate all of the addresses in my databases.  I have a bunch of 
databases - 12 million, 22 million, 23 million, and two around 65 million addresses.  More on the way.

This address validation tales a LONG time - about 50 minutes per two million addresses.  However it 
is now almost entirely automated, using software that I wrote.  I start it and let 33 (2 million 
record) files process to do one of my 65 million record tables.  Yea, it takes about a day and a 
half but I just start it and forget it.

OTOH, the client sends me an order that "has to be out by tomorrow evening".  I have automated these 
as well but they are still pretty manual labor intensive.  These orders I work on from the time I 
start until the time I finish.  If I am waiting for a 30 minute query, it is not possible (AFAIK) to 
determine that it is going to take 2 minutes or 30.  So I start it and have to stay tuned for it to 
finish so I can do the next piece.  It is precisely this that I am trying to analyze where the 
bottle necks are.

One of my 65 million record databases and a 50 million record database are the center of these 
orders.  Placing those two on SSD and having TONS of ram and TONS of cores would make things much 
snappier.  The 65 million record database compresses very well (45% or so).  The 50 million record 
database (the database from hell) doesn't compress well at all, perhaps 15%.  But if both were on 
independent SSDs and I had the memory and processors to work them correctly, they would almost 
certainly fly.  ATM they are both on a single raid volume so they are battling for IO and I have 
"only" 12 gigs of ram (available) and three cores (available) for processing them.

My intention is to just shoot for the moon.  In January of this year AMD made cores cheap!  It is 
time to make the next leap.  16 cores and 64 gigs of memory, and a SSD for each of those databases 
and I am willing to bet long odds that I would get startling performance.

This whole job started small, just the db from hell.  Over 5 years it has grown rapidly, and it has 
done so because I pay a lot of attention to the details.  My client knows that we need to step 
things up to handle his existing as well as his future needs.  This job has always been fun, but 
with the hardware that I am looking at it could be a blast.  Manipulating this volume of data in 
near real time is not easy I think, but with luck, planning and a little money I may accomplish that.

John W. Colby
www.ColbyConsulting.com

Jim Dettman wrote:
> John,
> 
> <<I have lost the occasional drive, and with a dedicated raid controller you
> just replace the drive 
> and the controller starts to rebuild the raid.  Raid 6 allows you to lose
> two drives and still 
> rebuild.  Raid 5 causes you to lose the shooting match if you lose 2 drives.
> I have massive amounts 
> of stuff here, and it does not belong to me.  I cannot afford to be cavalier
> about the system.>>
> 
>   I wasn't suggesting that you be cavalier about it.  Certainly no one here
> knows your exact situation or how you have everything configured.  I was
> just suggesting you weigh the risk vs the benefit.  Certainly if your
> requirements are such that you can afford zero down time, then RAID is
> called for.
> 
>   But often one backs into a situation slowly without re-evaluating why they
> are there in the first place.  For example, how much time does it cost you
> to rebuild the array when a drive drops vs restoring from a backup?  And
> keep in mind that often while a rebuild is in progress, through put is
> severely degraded.  You might be ahead of the game if you simply restored
> backups.  And of course if you had physical volumes to play with, then you
> could control where databases were being placed and gain even more
> performance.
> 
>   RAID is great for zero down time, but you do pay for that benefit in many
> ways.   And since performance seems to be the over riding goal here, RAID
> may be costing you more then it's worth.
> 
> Jim.