Asger Blond
ab-mi at post3.tele.dk
Thu Dec 3 19:13:00 CST 2009
Some points to consider: 1. A clustered index on the PK will be useful for the join operation. 2. A non clustered index on the filtering field will be useful for the search operation. 3. In your scenario the best choice IMO is a clustered index on the PK and a non clustered index on the filtering field. As mentioned in my previous posting this index will automatically include the values of the PK in the bottom of its index tree (the "leaf level"). Perhaps your searches don't specify the PK value (I guess not, since you use surrogate keys, which of cause are not candidates for searches). But nice to know anyhow - it certainly should prevent you from creating a composite index on the PK plus your filtering field: this kind of index would be an enormously waste of space and performance. 4. If you don't have a clustered index, then your table is a "heap". This means that the records can go anywhere in the physical file. A heap can be very efficient for inserts, because SQL Server then don't have to bother where to drop the records, it just scatters the records wherever room is available. But it is very-very inefficient for searches and joins, because then SQL Server has to consult a special internal table named IAM to find where it happened to drop your damned records. That's why it's normally bad. But you could imagine situations where this would be very well, for example an auditing table populated by a trigger. 5. If you want to make your inserts create huge amounts of disk fragmentation then just go ahead using a small-sizes db relying on auto grow of your db. If you don't want that then follow Mark's advice and size your db initially to the anticipated size. Asger -----Oprindelig meddelelse----- Fra: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] På vegne af jwcolby Sendt: 4. december 2009 00:24 Til: Discussion concerning MS SQL Server Emne: Re: [dba-SQLServer] SPAM-LOW: Re: Do I need a cover index? >However, if you are joining on the PKID, but are filtering "Where Field47 = 'Ice Cream and Jelly', then you should have an index on Field 47 also. As it happens I do perform a where on the data field so a cover is required. > Anyway, in your case, it seems all irrelevant, as you do a mass import and the immediately create your indexes. Correct, after which no records are ever deleted or added. I do (or may) modify fields but not the PK of course. > One last question, having just re-read your email, I see that you are talking about / hoping that a clustered index may keep the *fields *together. Well, I am reading that by creating a clustered index, every single data element (field) in that row is stored together, AND the rows are physically sorted on the index key - the PKID in this case. OTOH if you do not create a clustered index, then the data elements just go "in the heap" with pointers to the data in the heap maintained in the index. Again though, what does that mean? I know what a heap means in memory for a program, but it is a little difficult for me to equate that to a db file. I have no concept of what the structure of a db file looks like, where this "heap" might be etc. But it is always spoken of negatively so it must be bad. > BTW, one last question, when you create new databases, do you create the db as 1 mb and allow it to grow... I do. Yes it would be faster to create it initially as something bigger but it is difficult to know how big is big enough and in the end this is not enough of a problem to worry about. In the end I do the month to month processing in the same file, over and over again. I am actually considering (eventually) creating a temp database to use to get the data exported / imported etc. The nice thing about SQL Server is that you can simply specify the db name that you want to create a table in, append data in etc. So I could do a temp database, temp tables for the export and then just delete the db when the export is finished. Likewise for the import. Temp db, temp table(s) then append into the "real" table, or update records in the "real" table. But that is down the road. >If so, do you keep a handy, ready to go, empty 47 GB db lying around? Well, empty or not, 47 gigs is 47 gigs and copying that is slooooowwwww. You would lose some or all of the hoped for efficiency in the copy. John W. Colby www.ColbyConsulting.com Mark Breen wrote: > Hello John, > > I would have thought the the most important thing to consider is what > columns you will join on and what columns you will filter on. > > So, if you are only retrieving based on the PKID then I see no need to have > any additional index. However, if you are joining on the PKID, but are > filtering "Where Field47 = 'Ice Cream and Jelly', then you should have an > index on Field 47 also. > > Regarding Clustered vs non Clustered, I believed those to be highly in a > highly trafficed database where records are coming and going all the time. > In those cases, the indexes can become fragmented in a similar way that > hard disks get fragmented. More importantly, in a high volume, data entry > system, I understand that it can dramatically increase performance if you do > NOT cluster on the PK. (the following may be ten years out of date). The > reason to Cluster on the Non-PK fields as that multiple records for Invoices > 99, 100, 101, 102 would all be written to the same page within the db, and > if that page is locked for invoice 103, then another operator cannot raise > invoice 104 until 103 is completed. This was the logic I was thought in > 1997 in clustering on another column such as CustomerId instead of > InvoiceId. I really do not know if that is still relevant nowadays. > > Anyway, in your case, it seems all irrelevant, as you do a mass import and > the immediately create your indexes. IOW, your indexes are in perfect > condition. You probably only use them two or three times before you abandon > that db for the next one. > > One last question, having just re-read your email, I see that you are > talking about / hoping that a clustered index may keep the *fields *together. > I would have assumed 99% confidently that the fields must always be kept > together (as you say, what ever that might mean), but the clustering of > indexes only relates to keeping *records *together, not columns together. > So, if that is the case, you do not require a clustered index to keep > columns together, ie, the must always travel together. I have no idea how > to measure that. > > Am I totally off beam here, is the problem that I do not know what a cover > index is? > > BTW, one last question, when you create new databases, do you create the db > as 1 mb and allow it to grow, or do you initially create it as 47 gb, and > then just populate it with what ever arrives each month. Is is faster to do > your imports to a db that is already expanded up. If so, do you keep a > handy, ready to go, empty 47 GB db lying around? > > thanks > > Mark _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com