[dba-VB] SPAM-LOW: Re: What to do, what to do?

jwcolby jwcolby at colbyconsulting.com
Sat Nov 14 06:30:35 CST 2009


Read the discussion carefully Robert.  I am discussing doing chunks of thousands.

John W. Colby
www.ColbyConsulting.com


Robert Stewart wrote:
> John,
> 
> I don't think your machine, or any for that matter, have the memory 
> to load 50 million records, extract 40 million, transform them and 
> then load them back to the 50 million and update the database from 
> the recordset.  Datasets are for working with a small number of 
> records in memory, not millions.
> 
> Robert
> 
> 
> At 12:00 PM 11/13/2009, you wrote:
>> Date: Fri, 13 Nov 2009 11:39:14 -0500
>> From: jwcolby <jwcolby at colbyconsulting.com>
>> Subject: Re: [dba-VB] What to do, what to do?
>> To: "Discussion concerning Visual Basic and related programming
>>         issues."        <dba-vb at databaseadvisors.com>
>> Message-ID: <4AFD8BB2.5030203 at colbyconsulting.com>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> Shamil,
>>
>> Given what you now know about what I want to do...
>>
>> Why does each table get created in it's own DataSet.  I thought a 
>> data set could have multiple
>> tables.  I envisioned the following structure out in SQL Server:
>>
>> tblHSID - contains 6.4 million records.  Source of records for tblMergeSrc.
>> tblMergeSrc - Contains X thousand records for Y hundred 
>> people.  Working table in memory.
>> tblMergedRecs - the single record merged into will be created here 
>> (or copied here).  Working table
>> in memory.
>> tblPurgedRecs - The multiple records merged FROM will be archived 
>> here.  Working table in memory.
>>
>> tblHashPK - Hash for a person / address, PKs for person / 
>> address.  There is one hash for each
>> person / hash, there are multiple PKs for each person / 
>> address.  One record for each record in
>> tblHSID.
>>
>> tblHashCnt - Group by Hash, Cnt PK.  Server two purposes, provides a 
>> "DISTINCT" hash for each person
>> being processed as well as a count of how many records will be merged down.
>>
>> So... tblHSID is the source of the records to be 
>> processed.  Contains ONLY people who actually have
>> "duplicates", most people don't.
>>
>> The Top(N) records from tblHashCnt will select the Top(N) PEOPLE by 
>> number of records to be merged.
>>   Merge the ones with the most records first.
>>
>> That will be joined to tblHashPK to pull all of the PKs for the 
>> Top(N) people about to be processed.
>>
>> The PKs from that will be joined to tblHSID to pull a set of records 
>> to be merged.  Copied into
>> tblMergeSrc.  This is the first time HSID records (records to be 
>> merged) will be moved into a table
>> in a dataset.
>>
>> tblMergedRecs is empty but exists in the dataset.
>> tblPurgedRecs is empty but exists in the dataset.
>>
>> The merge will most likely happen in tblMergeSrc although a new 
>> record could be created in
>> tblMergedRecs to work in.  Not decided yet.
>>
>> In the end, the merge happens in the dataset in memory.  The actual 
>> ONE merged record for each
>> person ends up in tblMergedRecs.
>>
>> The actual (multiple) records merged into other records are moved 
>> into tblPurgedRecs.  This is an
>> archive.
>>
>> The records in tblPurgedRecs are then deleted from tblMergedSrc, and 
>> eventually deleted out of
>> tblHSID back in the database.
>>
>> The records for the next person are processed.  Merged records moved 
>> to tblMergedRecs.  "Purged"
>> records moved to tblPurgedRecs.
>>
>> Deleted from tblMergeSrc.
>>
>> Wash, rinse repeat.
>>
>> Now... I don't care how many data sets I have but I just question 
>> why I would create a new dataset
>> for each table above.  It seems that having them all in a single 
>> dataset would minimize confusion
>> and stuff.  Logically it seems that working with multiple tables 
>> inside of a single DataSet object
>> MIGHT be faster than moving data back and forth between tables in 
>> different DataSet objects, though
>> maybe not.
>>
>> And of course... my book talks all about connections, then all about 
>> commands, and then all about
>> parameters, then...
>>
>> How to work with a data reader.  WTF over?  What happened to my 
>> dataset and table objects?  Not a
>> WORD about those two objects.
>>
>> Sigh.
>>
>> John W. Colby
> _______________________________________________
> dba-VB mailing list
> dba-VB at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-vb
> http://www.databaseadvisors.com
> 
> 



More information about the dba-VB mailing list