[dba-SQLServer] SPAM-LOW: Re: Do I need a cover index?

Thu Dec 3 19:57:12 CST 2009

Thanks Asger.  I will take #5 to heart though I will have to figure out how to figure out what is a 
"good size".

;)

As I mentioned, at the moment I don't really have a lot of "new databases".  I do occasionally get 
new lists, but it is just occasionally.  Where I do much more is in monthly maintenance.  Every 
month I update the database, which creates new temp (currently) tables inside of the existing 
database.  As an example I create a sorted output table (on zip5/zip4) and copy every record into 
that table.  I then take 2 million records at a whack and place them in a temp table, created on the 
fly.  BCP out that "chunk".  Drop the table, create it all over again, copy the next 2 million 
records in.  BCP out.  Rinse repeat until all 50 million records are BCPd out.

As you might imagine that process causes enormous file growth, particularly in my log file but also 
in my main database file.

These database are largish, at least coming from Access where the db file is never more than a few 
hundred million bytes.  I might start with a 30 gig file and end up with a 100 gig file after 
processing.  The more I think about it the more I think I want to go with a temp db to perform this 
export, as well as a matching import.  Again though, how big is big enough.  I do an 11 million 
record file, then a 21 million record file, then a 50 million record file etc.  Just building an 
empty 50 gig file takes a long time for SQL Server.

I don't have any answers on how to figure this out.  As I get my C# control program happening I will 
be logging the start / stop time for each step (stored procedure) of this process.  I could also log 
start and stop file sizes for the database files.

Thus I could intentionally select different strategies and view my log files to see the times 
required for the strategy.

As of now I don't have that in place, though I am actively working on it.

The other thing I want to do is get my C# controller able to detach and attach these databases.  If 
I can detach them programmatically then I can have defrag running at night.  I own Perfectdisk 
server defrag which does a very good job at consolidating free space but the database files have to 
be detached.

BTW I moved my log files out to a separate "spindle", that being a raid array based on its own set 
of disks.  I am looking at the log files which I have been using a lot lately and while they are 
large, the actual number of fragments seems (to me) to be pretty reasonable.

50 gig - 4 Fragments
25 gig - 2 Fragments
30 gig - 6 Fragments

Most of them are in the 2-6 fragment range.  ONE is 16 fragments and it is "small" at 1.5 gig.

That is log file data.  And understand that I often shrink the log files which usually shrinks them 
back down to 1 meg, and the disk was defragged awhile back to get all of the free space together. 
But these log files were all created (expanded) since that defrag.

I think it is useful to remember that my databases are not transactional databases, and activity is 
concentrated in a single database as I perform this maintenance, then I move on to the next 
database.  Thus it is not like there are 20 highly varied databases all mixing it up on a minute by 
minute basis.

John W. Colby
www.ColbyConsulting.com

Asger Blond wrote:
> Some points to consider:
> 1. A clustered index on the PK will be useful for the join operation.
> 2. A non clustered index on the filtering field will be useful for the
> search operation.
> 3. In your scenario the best choice IMO is a clustered index on the PK and a
> non clustered index on the filtering field. As mentioned in my previous
> posting this index will automatically include the values of the PK in the
> bottom of its index tree (the "leaf level"). Perhaps your searches don't
> specify the PK value (I guess not, since you use surrogate keys, which of
> cause are not candidates for searches). But nice to know anyhow - it
> certainly should prevent you from creating a composite index on the PK plus
> your filtering field: this kind of index would be an enormously waste of
> space and performance.
> 4. If you don't have a clustered index, then your table is a "heap". This
> means that the records can go anywhere in the physical file. A heap can be
> very efficient for inserts, because SQL Server then don't have to bother
> where to drop the records, it just scatters the records wherever room is
> available. But it is very-very inefficient for searches and joins, because
> then SQL Server has to consult a special internal table named IAM to find
> where it happened to drop your damned records. That's why it's normally bad.
> But you could imagine situations where this would be very well, for example
> an auditing table populated by a trigger.
> 5. If you want to make your inserts create huge amounts of disk
> fragmentation then just go ahead using a small-sizes db relying on auto grow
> of your db. If you don't want that then follow Mark's advice and size your
> db initially to the anticipated size.
> Asger