jwcolby
jwcolby at colbyconsulting.com
Sat Dec 8 15:25:17 CST 2007
I am up to field 63 and now have 954 million data points, in 20 gig of data space. BTW I broke this destination (vertical) table out into its own db but on the same drive (raid) as the source (wide) table. I doubt that is the reason, however I am only averaging about 200 seconds per field at this point and the insert time isn't climbing as I had expected. I have to wonder what the effect would be of running Windows 2003 x64 with SQL Server x64 and 8 gigs of ram (the max I can put in the machine) instead of 4 gigs (the max that Windows 2003 x32 can handle). I tried BTW to do the install, but it failed and I didn't have time to pursue it. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of jwcolby Sent: Saturday, December 08, 2007 4:09 PM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Just an FYI >Thanks for promoting me to your Good Friends list! We have our differences but "I love ya Arthur" (in that sobbing tone). >Perhaps I forgot to suggest that the only rows/columns of interest Yes, that is what I am doing. Any fields with no data is a "" and the transform only pulls WHERE <> ''. So I only have valid data points. You can see that because the average data points per field is 15.93 million, not 51 million (the number of records). John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Arthur Fuller Sent: Saturday, December 08, 2007 3:38 PM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Just an FYI Thanks for promoting me to your Good Friends list! Perhaps I forgot to suggest that the only rows/columns of interest are those with values that are NOT NULL or equivalent to "". The entire point of this exercise is that we record only meaningful values. In the case of your YesNo columns, if the column contains neither Yes nor No then it ought not be written to the child table. All we care about are the Yes and No answers. The empty values are of no interest, since they contain no information. A. P.S. We have had our wars, which is one of several reasons why I deem you a Good Friend too, John. On 12/8/07, jwcolby <jwcolby at colbyconsulting.com> wrote: > > In converting the database from hell to a "vertical" table, I have > processed > 36 fields out of 584 fields from a table with 51.157 million records.. > > It has found 573,397,612 data elements in those 36 fields. 15.93 > million data points per field average. > > It has consumed 12.246 GBytes of data space (no indexes yet). 340 megs > / field average. > > It is taking roughly 300 seconds per field to process at this point in > time, though that bounces around a lot depending on how many data > elements it finds in a field. > > Assuming that the statistics so far hold true over the rest of the > data (not a real good assumption) and assuming that the insert time / > element remains the same (a horrid assumption) it will take (684-36) * > 5 minutes to finish the conversion - 3240 minutes, 54 hours. It will > also take 340 * 684 megs of data space - 232 gigs. The table when > finished would hold 684 * 15.93 million data points - 10896.12 million > data points - 10.896 Gig data points. > > The point of this exercise is to build a table with three fields. The > FieldID from a field lookup table, the PKID from the record that the > data point is found in, and the data itself. I will have converted a > table containing 50 million records but 684 fields, into a table with > 11 gig records, but 3 fields. Obviously a table scan is being used on > the wide table most of the time, I cannot realistically index 684 > fields, and even if I did, many (most) of the fields are a single > character consisting of 'Y'. > > Once the table is built I will build a single index on the table, a > clustered index with the fields arranged FieldID, Data and PKID on the > theory that most operations will be "where" on a field (thus the field > ID > first) where the data is a certain value (thus the data next) and > returning only the PKID. > > If I understand things (a HUGE assumption) this clustered index would > be optimized for that kind of search. > > So in the case of the "vertical" table, an "index scan" will be > performed BUT all the records for a given field will be contiguous, > BUT it will have to scan (worst case) 11 gig records, BUT it will only > have to read three fields consisting of two integers and a varchar(100). > > What would be interesting but probably not feasible would be to > analyze which fields are used the most by my client and place them > first in the table. That would guarantee that the most used fields > occur earliest in the search. THAT would make an enormous difference > I think. Then if there are only 5 million data points in a field and > they are contiguous, and they are at the front of the table... > > My good friend Arthur assures me that the new table will on average > win the race. We shall see. > > Does anyone have a got feeling for the results? > > John W. Colby > Colby Consulting > www.ColbyConsulting.com > > _______________________________________________ > dba-SQLServer mailing list > dba-SQLServer at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-sqlserver > http://www.databaseadvisors.com > > _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com