jwcolby
jwcolby at colbyconsulting.com
Sat Nov 14 06:30:35 CST 2009
Read the discussion carefully Robert. I am discussing doing chunks of thousands. John W. Colby www.ColbyConsulting.com Robert Stewart wrote: > John, > > I don't think your machine, or any for that matter, have the memory > to load 50 million records, extract 40 million, transform them and > then load them back to the 50 million and update the database from > the recordset. Datasets are for working with a small number of > records in memory, not millions. > > Robert > > > At 12:00 PM 11/13/2009, you wrote: >> Date: Fri, 13 Nov 2009 11:39:14 -0500 >> From: jwcolby <jwcolby at colbyconsulting.com> >> Subject: Re: [dba-VB] What to do, what to do? >> To: "Discussion concerning Visual Basic and related programming >> issues." <dba-vb at databaseadvisors.com> >> Message-ID: <4AFD8BB2.5030203 at colbyconsulting.com> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> Shamil, >> >> Given what you now know about what I want to do... >> >> Why does each table get created in it's own DataSet. I thought a >> data set could have multiple >> tables. I envisioned the following structure out in SQL Server: >> >> tblHSID - contains 6.4 million records. Source of records for tblMergeSrc. >> tblMergeSrc - Contains X thousand records for Y hundred >> people. Working table in memory. >> tblMergedRecs - the single record merged into will be created here >> (or copied here). Working table >> in memory. >> tblPurgedRecs - The multiple records merged FROM will be archived >> here. Working table in memory. >> >> tblHashPK - Hash for a person / address, PKs for person / >> address. There is one hash for each >> person / hash, there are multiple PKs for each person / >> address. One record for each record in >> tblHSID. >> >> tblHashCnt - Group by Hash, Cnt PK. Server two purposes, provides a >> "DISTINCT" hash for each person >> being processed as well as a count of how many records will be merged down. >> >> So... tblHSID is the source of the records to be >> processed. Contains ONLY people who actually have >> "duplicates", most people don't. >> >> The Top(N) records from tblHashCnt will select the Top(N) PEOPLE by >> number of records to be merged. >> Merge the ones with the most records first. >> >> That will be joined to tblHashPK to pull all of the PKs for the >> Top(N) people about to be processed. >> >> The PKs from that will be joined to tblHSID to pull a set of records >> to be merged. Copied into >> tblMergeSrc. This is the first time HSID records (records to be >> merged) will be moved into a table >> in a dataset. >> >> tblMergedRecs is empty but exists in the dataset. >> tblPurgedRecs is empty but exists in the dataset. >> >> The merge will most likely happen in tblMergeSrc although a new >> record could be created in >> tblMergedRecs to work in. Not decided yet. >> >> In the end, the merge happens in the dataset in memory. The actual >> ONE merged record for each >> person ends up in tblMergedRecs. >> >> The actual (multiple) records merged into other records are moved >> into tblPurgedRecs. This is an >> archive. >> >> The records in tblPurgedRecs are then deleted from tblMergedSrc, and >> eventually deleted out of >> tblHSID back in the database. >> >> The records for the next person are processed. Merged records moved >> to tblMergedRecs. "Purged" >> records moved to tblPurgedRecs. >> >> Deleted from tblMergeSrc. >> >> Wash, rinse repeat. >> >> Now... I don't care how many data sets I have but I just question >> why I would create a new dataset >> for each table above. It seems that having them all in a single >> dataset would minimize confusion >> and stuff. Logically it seems that working with multiple tables >> inside of a single DataSet object >> MIGHT be faster than moving data back and forth between tables in >> different DataSet objects, though >> maybe not. >> >> And of course... my book talks all about connections, then all about >> commands, and then all about >> parameters, then... >> >> How to work with a data reader. WTF over? What happened to my >> dataset and table objects? Not a >> WORD about those two objects. >> >> Sigh. >> >> John W. Colby > _______________________________________________ > dba-VB mailing list > dba-VB at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-vb > http://www.databaseadvisors.com > >