[AccessD] [dba-SQLServer] My learning experience

Mon Dec 10 10:21:13 CST 2007

I'm interested.  So after the crunch is done, please do come back and 
tell us how it all worked out.  BTW I just love watching how you think!
Tina

jwcolby wrote:
> OK, so here I am going to run some timings on various things, using all
> three databases.  My client gave me an actual order to provide counts of the
> HSID data against the smoker data (a different table).  From the HSID data I
> need to do selections:
>
> NarrowIncomeBand 	codes 1-5
> 			codes 6-8
> 			codes 9, A-D
> 			codes E-I
> 			codes J-S
> 			codes T 
>
> Gender - Male	Code 1
> Gender - Female	Code 2
>
> YearModelOfBoat	Code <> ''
>
> Pool			Code 'Y'
>
> Etc Etc.  The client is trying to understand the data in two different
> completely unrelated tables, the HSID file and the TFF smokers, as well as
> the HSID file and the SMK smokers.
>
> The order actually has me join PKID result sets to another pair of tables
> the TFF smoker file and the SMK smoker file.  IOW count:
>
> TFF smokers WHERE narrow income band 1-5
> SMK smokers WHERE narrow income band 1-5 
>
> TFF Smokers WHERE Gender 1
> SMK Smokers WHERE Gender 1
>
> TFF Smokers WHERE Gender 2
> SMK Smokers WHERE Gender 2
>
> etc etc.
>
> The problem is that TFF and SMK smokers are unrelated between themselves,
> and also unrelated to HSID.  In order to get counts, I need to somehow get
> something in common, which will be the name / address data.  That part of
> the equation is really too complex to go into here ("beyond the scope of
> this discussion") and so I am going to punt and ignore the join between HSID
> and any other file (smokers in this case) and instead simply discuss how I
> "prep" the HSID file to allow me to compare it later on, and then compare
> timings of those "prepped" queries.
>
> Create order structures:
>
> I typically create a directory structure - PSM (the client) \ Orders \
> OrderNum.  In that directory on the disk I place the spreadsheet containing
> the order as well as any queries that I create that I want to save out as
> actual query files, any documents I get or create, and any CSVs or fixed
> with output files in the case of orders for names or counts.  Counts
> generally go into spreadsheets directly.  
>
> I also create a new database container to hold the views and temp tables
> required to fill that specific order, with the name of the order in the
> database name.  Once I have a database container I...
>
> Create the Indexes: 
>
> The next thing I have to do in the case of tblHSIDClean and tblHSID is to go
> in and make sure that I have indexes on the fields that I am going to do
> WHERE comparisons on.  Sometimes they already exist from a previous order,
> sometimes I have to create the indexes because there isn't one already.  In
> the case of tblHSIDClean I will create indexes of just each field
> individually.  In tblHSID I will create "cover indexes" of either the
> individual field AND the PKID, or if there are several RELATED fields in the
> WHERE clause I will create a cover index including all of the related fields
> AND the PKID field.  Creating these indexes typically takes a few minutes
> per index, with the exact time varying by how many actual data points are in
> a specific field, as well as how many fields are in the cover index (in the
> case of tblHSID).
>
> In the case of tblHSIDVert, I already have a single clustered index and that
> is all I need.
>
> Create views:
>
> Next I create views to pull each field with the PKID for each WHERE column.
> Strictly speaking individual queries are not required, i.e. I could create
> one mongo query (and I used to do so), but in the past anyway these mongo
> queries would take anywhere from 20 minutes to an hour to run.  The problem
> was simply that if anything was not right I would have to make changes and
> rerun, leading to LOOOOONG times to get the thing done, though not
> necessarily lots of billable time.  So I learned to create "field specific"
> views that would pull just the PKIDs required for one piece of the final
> result, and later join the pieces into a whole once the pieces pulled the
> right things.
>
> With a query for each piece of the bigger WHERE clause, I would then start
> the process of combining the queries to AND and OR them into the larger
> "whole WHERE clause".  Joins between the PKIDs form the ANDS.  So if I need
> (for example) WHERE Income in('D', 'E', 'F') AND CasionGambler = 'Y', I
> would create two queries, one to get all the PKIDs matching the income
> criteria, the other with all the PKIDs that match the CasinoGambler, and
> then join the PKIDs in a higher level query to AND the result set.
>
> Often times I need to get COUNTS so I would build a query on top of the
> queries that and / or the smaller queries to COUNT() the PKIDs that make it
> through the subqueries.
>
> If you have gotten this far then you understand how I go about constructing
> the stuff required to fill the order.  The order might just be "counts" of
> how many people in an income range gamble, or how many people have dogs AND
> have cats, or how many people have children, are in a specific income range,
> and have cats.  My client then knows how many "addresses" he can deliver for
> those criteria, and can go to his client with a proposal for supplying
> addresses.  He might get the order, in which case he comes back to me and
> says, "now deliver 100K name / Address records in zip codes X, Y and Z".
> And of course this assumes he asked for counts in those zip codes, and there
> were at least 100K records made it through the filters.
>
> That's the idea and now I will present some timing results for some of the
> sub queries used in an actual order, this order is just for counts as
> discussed at the top of this email.  I will construct the sub queries for
> all three different databases and time them.  I will NOT take it to the
> point where I tie it into the smoker databases because that "is beyond the
> scope of this discussion".
>
> I am also going to bring AccessD into this because there might be folks on
> that list interested in this subject.  
>
> OTOH maybe NOBODY except Arthur is interested!  ;-)
>
> Next email, the results.  It may be a day or so.  I have to actually get
> this out TONIGHT so I will be doing whatever I have to do to get results.  I
> will do timings tomorrow sometime.
>
> John W. Colby
> Colby Consulting
> www.ColbyConsulting.com 
>
>