[AccessD] deduplication

Mon Jan 15 08:18:10 CST 2007

John:
To proces this many records, and not have it running for days seems to me to
need an SP, and use of a temp table (a real, MSSQL in-memory temp table,
which is much faster than a writing to 'real' table). So, write an SP that
opens a cursor on the unique person-identifier.  Loop thru these,  then on
each, select the count of matching records; if more than 1, open another
cursor and loop those that match.  Write all the data in the matches to a
single record in a temp table, delete all  the matches from the main table,
then append the temp record to the real table.

My VBA is much better than my TSQL, so bear with this pseudo-code:

Declare rs cursor for select unique person-identifier from main_table

Fetch next from rs into @id
Count how many records in main_table have  person-identifier = @id

If > 1 then
	inset into #tmp 1 record with  person-identifier = @id, and all the
rest of the fields
	Declare rs2 cursor for select * from main_table where
person-identifier = @id
	for i= 1 to 650
	if fld(i) not null
		update #tmp set fld(i)=some_value 
	next I
	fetch next

	delete from main_table where  person-identifier = @id
	insert #tmp record into main_table
End if

Hth
Steve

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of JWColby
Sent: Monday, January 15, 2007 8:52 AM
To: 'Access Developers discussion and problem solving'
Subject: Re: [AccessD] deduplication

Michael,

I am looking at normalizing the data.  However in many cases the client has
to determine the fields that belong in a "child" survey table.  Sometimes
you can just look at the field names and tell the groupings, sometimes not.
And of course, visually LOOKING at the data is simply not an option when
there are 50 million records and any given survey may only populate a
million of those records.

John W. Colby
Colby Consulting
www.ColbyConsulting.com

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Michael R Mattys
Sent: Monday, January 15, 2007 7:46 AM
To: Access Developers discussion and problem solving
Subject: Re: [AccessD] deduplication

----- Original Message -----
From: "JWColby" <jwcolby at colbyconsulting.com> <snip>
> Thus my job is to now identify all three of these "john Colby" records 
> and merge all of the "survey" fields back into a single record, 
> deleting the two now un-needed extra records in the process.  If it 
> weren't for the fact that there are about 650 "survey" fields this 
> would not be such a big task.
<snip>
> It is specifically the ~650 fields that causes the issue for Access as 
> (I
> believe) Access cannot handle that many fields at a time.
<snip>

I think I'd still look at how to normalize the database.
Look at how to query the data such that an INSERT ...WHERE would create a
distinct table for Survey N.
Then write all 700 fields as the header to a csv Then use each survey-query
to populate the one line across.
Import into a new table when finished.

Michael R. Mattys
MapPoint & Access Dev
www.mattysconsulting.com 

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com

--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com