jwcolby
jwcolby at colbyconsulting.com
Sat Mar 6 21:10:20 CST 2010
I went to the SQL Saturday 33 down in Charlotte today. As a non-dba it was mostly over my head but two sessions were most useful to me. One was on "the state of (solid state) disks" which was about where SSDs are today and what they can and cannot do for the database. The consensus is pretty much that they are not yet ready for prime time. That said the enterprise versions (read BIG BUCKS) are being used today! And yet for read-only applications they shine. Mine is a read-only application so I think I will proceed carefully on that front. The other session that was very interesting to me is SQL Server 2008 compression. This is currently only in Enterprise and (lucky for me) developer's edition (which is enterprise). I knew about the backup compression and started using it a couple of months ago now, but was not previously aware of the data compression side. So when I got home I started testing data compression on my databases. Interesting but probably not overly useful for me. The main table compressed about 50% IF I rearranged my CLUSTERED index to get all of the rows in Zip 5/4 order. However indexes ended up more like 18% or so. I had previously used just my int32 PKID as the key on the CLUSTERED index, but the compression was very slight with that. Reindexing on zip5/4/pkid bumped that up to 50%... for the main table. Sadly the indexes stayed well below 20%. SQL Server can compress at the row or page level and in order to get anything truly useful I had to go to the page. Even then the indexes just didn't compress spectacularly well. The problem is that they stay compressed even in memory. While that sounds good, and would be if you got a great compression ratio, it means overhead at the instant of use every time a value is used. 50% would probably be worthwhile. 18%... I am guessing not. I am not really knowledgeable enough to do good testing, but my gut tells me that 18% on indexes will be about the break even point or even less in terms of performance gain. The idea of course is that compression allows more data to move on/off the disk in a given time unit, and likewise allows more data to stay in memory at any given time. However that is at the expense of CPU overhead to do the read decompression as EACH DATA ITEM is used. Every time it is used. Indexes in particular would take a hit exactly because they are used so intensively for joins and where clauses. I had high hopes but I now doubt the usefulness in my application. If anyone out there is using this data compression and can chime in from your experience please do. One thing that MIGHT save the day is that apparently the granularity of the compression is fairly tight, i.e. you can specify that any given index is or is not compressed. It really sounds like a tuning nightmare to me. OTOH I do use the compression for backups, and am loving it. Very significant file size reduction AND speed improvements. any opinions out there? -- John W. Colby www.ColbyConsulting.com