[dba-SQLServer] Just an FYI

Sat Dec 8 14:37:50 CST 2007

Thanks for promoting me to your Good Friends list!

Perhaps I forgot to suggest that the only rows/columns of interest are those
with values that are NOT NULL or equivalent to  "". The entire point of this
exercise is that we record only meaningful values. In the case of your YesNo
columns, if the column contains neither Yes nor No then it ought not be
written to the child table. All we care about are the Yes and No answers.
The empty values are of no interest, since they contain no information.

A.

P.S.
We have had our wars, which is one of several reasons why I deem you a Good
Friend too, John.

On 12/8/07, jwcolby <jwcolby at colbyconsulting.com> wrote:
>
> In converting the database from hell to a "vertical" table, I have
> processed
> 36 fields out of 584 fields from a table with 51.157 million records..
>
> It has found 573,397,612 data elements in those 36 fields.  15.93 million
> data points per field average.
>
> It has consumed 12.246 GBytes of data space (no indexes yet). 340 megs /
> field average.
>
> It is taking roughly 300 seconds per field to process at this point in
> time,
> though that bounces around a lot depending on how many data elements it
> finds in a field.
>
> Assuming that the statistics so far hold true over the rest of the data
> (not
> a real good assumption) and assuming that the insert time / element
> remains
> the same (a horrid assumption) it will take (684-36) * 5 minutes to finish
> the conversion - 3240 minutes, 54 hours.  It will also take 340 * 684 megs
> of data space - 232 gigs.  The table when finished would hold 684 * 15.93
> million data points - 10896.12 million data points - 10.896 Gig data
> points.
>
> The point of this exercise is to build a table with three fields.  The
> FieldID from a field lookup table, the PKID from the record that the data
> point is found in, and the data itself.  I will have converted a table
> containing 50 million records but 684 fields, into a table with 11 gig
> records, but 3 fields.  Obviously a table scan is being used on the wide
> table most of the time, I cannot realistically index 684 fields, and even
> if
> I did, many (most) of the fields are a single character consisting of 'Y'.
>
> Once the table is built I will build a single index on the table, a
> clustered index with the fields arranged FieldID, Data and PKID on the
> theory that most operations will be "where" on a field (thus the field ID
> first) where the data is a certain value (thus the data next) and
> returning
> only the PKID.
>
> If I understand things (a HUGE assumption) this clustered index would be
> optimized for that kind of search.
>
> So in the case of the "vertical" table, an "index scan" will be performed
> BUT all the records for a given field will be contiguous, BUT it will have
> to scan (worst case) 11 gig records, BUT it will only have to read three
> fields consisting of two integers and a varchar(100).
>
> What would be interesting but probably not feasible would be to analyze
> which fields are used the most by my client and place them first in the
> table.  That would guarantee that the most used fields occur earliest in
> the
> search.  THAT would make an enormous difference I think.  Then if there
> are
> only 5 million data points in a field and they are contiguous, and they
> are
> at the front of the table...
>
> My good friend Arthur assures me that the new table will on average win
> the
> race.  We shall see.
>
> Does anyone have a got feeling for the results?
>
> John W. Colby
> Colby Consulting
> www.ColbyConsulting.com
>
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
>
>