jwcolby
jwcolby at colbyconsulting.com
Thu Dec 3 19:57:12 CST 2009
Thanks Asger. I will take #5 to heart though I will have to figure out how to figure out what is a "good size". ;) As I mentioned, at the moment I don't really have a lot of "new databases". I do occasionally get new lists, but it is just occasionally. Where I do much more is in monthly maintenance. Every month I update the database, which creates new temp (currently) tables inside of the existing database. As an example I create a sorted output table (on zip5/zip4) and copy every record into that table. I then take 2 million records at a whack and place them in a temp table, created on the fly. BCP out that "chunk". Drop the table, create it all over again, copy the next 2 million records in. BCP out. Rinse repeat until all 50 million records are BCPd out. As you might imagine that process causes enormous file growth, particularly in my log file but also in my main database file. These database are largish, at least coming from Access where the db file is never more than a few hundred million bytes. I might start with a 30 gig file and end up with a 100 gig file after processing. The more I think about it the more I think I want to go with a temp db to perform this export, as well as a matching import. Again though, how big is big enough. I do an 11 million record file, then a 21 million record file, then a 50 million record file etc. Just building an empty 50 gig file takes a long time for SQL Server. I don't have any answers on how to figure this out. As I get my C# control program happening I will be logging the start / stop time for each step (stored procedure) of this process. I could also log start and stop file sizes for the database files. Thus I could intentionally select different strategies and view my log files to see the times required for the strategy. As of now I don't have that in place, though I am actively working on it. The other thing I want to do is get my C# controller able to detach and attach these databases. If I can detach them programmatically then I can have defrag running at night. I own Perfectdisk server defrag which does a very good job at consolidating free space but the database files have to be detached. BTW I moved my log files out to a separate "spindle", that being a raid array based on its own set of disks. I am looking at the log files which I have been using a lot lately and while they are large, the actual number of fragments seems (to me) to be pretty reasonable. 50 gig - 4 Fragments 25 gig - 2 Fragments 30 gig - 6 Fragments Most of them are in the 2-6 fragment range. ONE is 16 fragments and it is "small" at 1.5 gig. That is log file data. And understand that I often shrink the log files which usually shrinks them back down to 1 meg, and the disk was defragged awhile back to get all of the free space together. But these log files were all created (expanded) since that defrag. I think it is useful to remember that my databases are not transactional databases, and activity is concentrated in a single database as I perform this maintenance, then I move on to the next database. Thus it is not like there are 20 highly varied databases all mixing it up on a minute by minute basis. John W. Colby www.ColbyConsulting.com Asger Blond wrote: > Some points to consider: > 1. A clustered index on the PK will be useful for the join operation. > 2. A non clustered index on the filtering field will be useful for the > search operation. > 3. In your scenario the best choice IMO is a clustered index on the PK and a > non clustered index on the filtering field. As mentioned in my previous > posting this index will automatically include the values of the PK in the > bottom of its index tree (the "leaf level"). Perhaps your searches don't > specify the PK value (I guess not, since you use surrogate keys, which of > cause are not candidates for searches). But nice to know anyhow - it > certainly should prevent you from creating a composite index on the PK plus > your filtering field: this kind of index would be an enormously waste of > space and performance. > 4. If you don't have a clustered index, then your table is a "heap". This > means that the records can go anywhere in the physical file. A heap can be > very efficient for inserts, because SQL Server then don't have to bother > where to drop the records, it just scatters the records wherever room is > available. But it is very-very inefficient for searches and joins, because > then SQL Server has to consult a special internal table named IAM to find > where it happened to drop your damned records. That's why it's normally bad. > But you could imagine situations where this would be very well, for example > an auditing table populated by a trigger. > 5. If you want to make your inserts create huge amounts of disk > fragmentation then just go ahead using a small-sizes db relying on auto grow > of your db. If you don't want that then follow Mark's advice and size your > db initially to the anticipated size. > Asger