[dba-VB] What to do, what to do?

Fri Nov 13 10:39:14 CST 2009

Shamil,

Given what you now know about what I want to do...

Why does each table get created in it's own DataSet.  I thought a data set could have multiple 
tables.  I envisioned the following structure out in SQL Server:

tblHSID - contains 6.4 million records.  Source of records for tblMergeSrc.
tblMergeSrc - Contains X thousand records for Y hundred people.  Working table in memory.
tblMergedRecs - the single record merged into will be created here (or copied here).  Working table 
in memory.
tblPurgedRecs - The multiple records merged FROM will be archived here.  Working table in memory.

tblHashPK - Hash for a person / address, PKs for person / address.  There is one hash for each 
person / hash, there are multiple PKs for each person / address.  One record for each record in 
tblHSID.

tblHashCnt - Group by Hash, Cnt PK.  Server two purposes, provides a "DISTINCT" hash for each person 
being processed as well as a count of how many records will be merged down.

So... tblHSID is the source of the records to be processed.  Contains ONLY people who actually have 
"duplicates", most people don't.

The Top(N) records from tblHashCnt will select the Top(N) PEOPLE by number of records to be merged. 
  Merge the ones with the most records first.

That will be joined to tblHashPK to pull all of the PKs for the Top(N) people about to be processed.

The PKs from that will be joined to tblHSID to pull a set of records to be merged.  Copied into 
tblMergeSrc.  This is the first time HSID records (records to be merged) will be moved into a table 
in a dataset.

tblMergedRecs is empty but exists in the dataset.
tblPurgedRecs is empty but exists in the dataset.

The merge will most likely happen in tblMergeSrc although a new record could be created in 
tblMergedRecs to work in.  Not decided yet.

In the end, the merge happens in the dataset in memory.  The actual ONE merged record for each 
person ends up in tblMergedRecs.

The actual (multiple) records merged into other records are moved into tblPurgedRecs.  This is an 
archive.

The records in tblPurgedRecs are then deleted from tblMergedSrc, and eventually deleted out of 
tblHSID back in the database.

The records for the next person are processed.  Merged records moved to tblMergedRecs.  "Purged" 
records moved to tblPurgedRecs.

Deleted from tblMergeSrc.

Wash, rinse repeat.

Now... I don't care how many data sets I have but I just question why I would create a new dataset 
for each table above.  It seems that having them all in a single dataset would minimize confusion 
and stuff.  Logically it seems that working with multiple tables inside of a single DataSet object 
MIGHT be faster than moving data back and forth between tables in different DataSet objects, though 
maybe not.

And of course... my book talks all about connections, then all about commands, and then all about 
parameters, then...

How to work with a data reader.  WTF over?  What happened to my dataset and table objects?  Not a 
WORD about those two objects.

Sigh.

John W. Colby
www.ColbyConsulting.com

Shamil Salakhetdinov wrote:
> Hi John --
> 
> Here is a sample:
> 
> using System;
> using System.Collections.Generic;
> using System.Text;
> 
> using System.Data;
> using System.Data.SqlClient; 
>  
> namespace TestConsole.Samples
> {
>     public class DataSetTests
>     {
>         public static void Run()
>         {
>             string connectionString = "Data Source=HAMBURG\\SQL2005;"+
>                 "Initial Catalog=Northwind;User Id=sa;Password=N/A";
>             using (SqlConnection connection =
>                    new SqlConnection(connectionString))
>             {