jwcolby
jwcolby at colbyconsulting.com
Fri Nov 13 10:39:14 CST 2009
Shamil, Given what you now know about what I want to do... Why does each table get created in it's own DataSet. I thought a data set could have multiple tables. I envisioned the following structure out in SQL Server: tblHSID - contains 6.4 million records. Source of records for tblMergeSrc. tblMergeSrc - Contains X thousand records for Y hundred people. Working table in memory. tblMergedRecs - the single record merged into will be created here (or copied here). Working table in memory. tblPurgedRecs - The multiple records merged FROM will be archived here. Working table in memory. tblHashPK - Hash for a person / address, PKs for person / address. There is one hash for each person / hash, there are multiple PKs for each person / address. One record for each record in tblHSID. tblHashCnt - Group by Hash, Cnt PK. Server two purposes, provides a "DISTINCT" hash for each person being processed as well as a count of how many records will be merged down. So... tblHSID is the source of the records to be processed. Contains ONLY people who actually have "duplicates", most people don't. The Top(N) records from tblHashCnt will select the Top(N) PEOPLE by number of records to be merged. Merge the ones with the most records first. That will be joined to tblHashPK to pull all of the PKs for the Top(N) people about to be processed. The PKs from that will be joined to tblHSID to pull a set of records to be merged. Copied into tblMergeSrc. This is the first time HSID records (records to be merged) will be moved into a table in a dataset. tblMergedRecs is empty but exists in the dataset. tblPurgedRecs is empty but exists in the dataset. The merge will most likely happen in tblMergeSrc although a new record could be created in tblMergedRecs to work in. Not decided yet. In the end, the merge happens in the dataset in memory. The actual ONE merged record for each person ends up in tblMergedRecs. The actual (multiple) records merged into other records are moved into tblPurgedRecs. This is an archive. The records in tblPurgedRecs are then deleted from tblMergedSrc, and eventually deleted out of tblHSID back in the database. The records for the next person are processed. Merged records moved to tblMergedRecs. "Purged" records moved to tblPurgedRecs. Deleted from tblMergeSrc. Wash, rinse repeat. Now... I don't care how many data sets I have but I just question why I would create a new dataset for each table above. It seems that having them all in a single dataset would minimize confusion and stuff. Logically it seems that working with multiple tables inside of a single DataSet object MIGHT be faster than moving data back and forth between tables in different DataSet objects, though maybe not. And of course... my book talks all about connections, then all about commands, and then all about parameters, then... How to work with a data reader. WTF over? What happened to my dataset and table objects? Not a WORD about those two objects. Sigh. John W. Colby www.ColbyConsulting.com Shamil Salakhetdinov wrote: > Hi John -- > > Here is a sample: > > using System; > using System.Collections.Generic; > using System.Text; > > using System.Data; > using System.Data.SqlClient; > > namespace TestConsole.Samples > { > public class DataSetTests > { > public static void Run() > { > string connectionString = "Data Source=HAMBURG\\SQL2005;"+ > "Initial Catalog=Northwind;User Id=sa;Password=N/A"; > using (SqlConnection connection = > new SqlConnection(connectionString)) > {