jwcolby
jwcolby at colbyconsulting.com
Thu Dec 3 17:24:14 CST 2009
>However, if you are joining on the PKID, but are filtering "Where Field47 = 'Ice Cream and Jelly', then you should have an index on Field 47 also. As it happens I do perform a where on the data field so a cover is required. > Anyway, in your case, it seems all irrelevant, as you do a mass import and the immediately create your indexes. Correct, after which no records are ever deleted or added. I do (or may) modify fields but not the PK of course. > One last question, having just re-read your email, I see that you are talking about / hoping that a clustered index may keep the *fields *together. Well, I am reading that by creating a clustered index, every single data element (field) in that row is stored together, AND the rows are physically sorted on the index key - the PKID in this case. OTOH if you do not create a clustered index, then the data elements just go "in the heap" with pointers to the data in the heap maintained in the index. Again though, what does that mean? I know what a heap means in memory for a program, but it is a little difficult for me to equate that to a db file. I have no concept of what the structure of a db file looks like, where this "heap" might be etc. But it is always spoken of negatively so it must be bad. > BTW, one last question, when you create new databases, do you create the db as 1 mb and allow it to grow... I do. Yes it would be faster to create it initially as something bigger but it is difficult to know how big is big enough and in the end this is not enough of a problem to worry about. In the end I do the month to month processing in the same file, over and over again. I am actually considering (eventually) creating a temp database to use to get the data exported / imported etc. The nice thing about SQL Server is that you can simply specify the db name that you want to create a table in, append data in etc. So I could do a temp database, temp tables for the export and then just delete the db when the export is finished. Likewise for the import. Temp db, temp table(s) then append into the "real" table, or update records in the "real" table. But that is down the road. >If so, do you keep a handy, ready to go, empty 47 GB db lying around? Well, empty or not, 47 gigs is 47 gigs and copying that is slooooowwwww. You would lose some or all of the hoped for efficiency in the copy. John W. Colby www.ColbyConsulting.com Mark Breen wrote: > Hello John, > > I would have thought the the most important thing to consider is what > columns you will join on and what columns you will filter on. > > So, if you are only retrieving based on the PKID then I see no need to have > any additional index. However, if you are joining on the PKID, but are > filtering "Where Field47 = 'Ice Cream and Jelly', then you should have an > index on Field 47 also. > > Regarding Clustered vs non Clustered, I believed those to be highly in a > highly trafficed database where records are coming and going all the time. > In those cases, the indexes can become fragmented in a similar way that > hard disks get fragmented. More importantly, in a high volume, data entry > system, I understand that it can dramatically increase performance if you do > NOT cluster on the PK. (the following may be ten years out of date). The > reason to Cluster on the Non-PK fields as that multiple records for Invoices > 99, 100, 101, 102 would all be written to the same page within the db, and > if that page is locked for invoice 103, then another operator cannot raise > invoice 104 until 103 is completed. This was the logic I was thought in > 1997 in clustering on another column such as CustomerId instead of > InvoiceId. I really do not know if that is still relevant nowadays. > > Anyway, in your case, it seems all irrelevant, as you do a mass import and > the immediately create your indexes. IOW, your indexes are in perfect > condition. You probably only use them two or three times before you abandon > that db for the next one. > > One last question, having just re-read your email, I see that you are > talking about / hoping that a clustered index may keep the *fields *together. > I would have assumed 99% confidently that the fields must always be kept > together (as you say, what ever that might mean), but the clustering of > indexes only relates to keeping *records *together, not columns together. > So, if that is the case, you do not require a clustered index to keep > columns together, ie, the must always travel together. I have no idea how > to measure that. > > Am I totally off beam here, is the problem that I do not know what a cover > index is? > > BTW, one last question, when you create new databases, do you create the db > as 1 mb and allow it to grow, or do you initially create it as 47 gb, and > then just populate it with what ever arrives each month. Is is faster to do > your imports to a db that is already expanded up. If so, do you keep a > handy, ready to go, empty 47 GB db lying around? > > thanks > > Mark