[AccessD] merging records

jwcolby jwcolby at colbyconsulting.com
Wed Nov 14 14:41:46 CST 2007


Precisely.  I will most likely run this in VB.Net since it has to eventually
process 50 million records.  

I long ago generated a hash field for the address, family (address+lastName)
and Person (Address+Family+FirstName) so I have hash fields to allow me to
find the "same person", ignoring such problems as John Colby and John W.
Colby at the same address.  

This project is one of the reasons I was asking the question about iterating
the fields of a class in .Net.  If I build a class which is instantiated
once for each record for a given person, I can then update the first
instance using the data in the subsequent instances, and when done, write
the first instance back to the table (or a new table).  

If I do it correctly the classes of the extraneous records can be told to
delete their record in the table after each "person" recordset is scrubbed.
I can put a system like this on autopilot to run over a week or month,
however long it might take.  And with 50 million records it is going to take
awhile.  But with a VB.Net program running in the background scrubbing the
table, I can continue to use the table, with increasing accuracy as the
table is being scrubbed.

None of the other providers for my client has ever attempted to do this, for
obvious reasons (it ain't easy!).

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of A.D.TEJPAL
Sent: Wednesday, November 14, 2007 2:50 PM
To: Access Developers discussion and problem solving
Cc: A.D.TEJPAL
Subject: Re: [AccessD] merging records

John,

    Apparently, each column carries only one significant value (over a group
of records) for each combination of person & address. Your objective is to
display only one compacted row per combination of person & address, showing
only the significant values for survey results in various columns across the
record.

    As a programmatic solution, the following course of action is suggested:

    1 - Let the source table be named T_Data. Its first four fields are ID
(PK), FirstName, LastName and Address, followed by large number of other
fields (like Smokes etc) meant to hold survey response.

    2 - Create an empty table named T_Result. Its structure should be
identical to that of T_Data.

    3 - Create a dummy table T_Dummy having one field. Populate it with one
record.

    Having taken the above steps, if you run sample subroutine
P_PopulateResultTable as given below, table T_Result will get populated with
the compacted survey results in desired format. You might like to try it out
and confirm whether it is in line with what you have been aiming at.

    Note - It has been tested on Access 2003 desktop (Access 2000 file
format). Reference required - DAO 3.6

Best wishes,
A.D.Tejpal
------------

Sample subroutine - for merging survey results T_data is source table.
Results are appended to T_Result.
'=====================================
Sub P_PopulateResultTable()
    ' This subroutine merges the survey
    ' results for each person in source table
    ' T_Data and appends the outcome into
    ' destination table T_Result. Structure of T_Result
    ' is identical to that of T_Data
    ' T_Dummy is a single field single record table.
    
    Dim Qst As String, Txt As String
    Dim Fnm As String, Qst2 As String
    Dim Fv As Variant
    
    Dim rst1 As DAO.Recordset
    Dim rst2 As DAO.Recordset
    Dim fd As Field
    Dim tdf As TableDef
    Dim db As DAO.Database
    
    Const SourceTable As String = "T_Data"
    Const DestnTable As String = "T_Result"
    Const DummyTable As String = "T_Dummy"
    
    ' Comma separated string of all field names
    ' that do not directly carry survey response
    Const ExemptFields As String = _
            "ID,FirstName,LastName,Address"
    
    Set db = DBEngine(0)(0)
    
    ' Clear destination table
    db.Execute "DELETE * FROM " & _
                    DestnTable & ";", dbFailOnError
    
    Qst = "SELECT FirstName, LastName, " & _
            "Address FROM " & SourceTable & _
            " GROUP BY FirstName, " & _
            "LastName, Address;"
    Set rst1 = db.OpenRecordset(Qst)
    
    Set tdf = db.TableDefs(SourceTable)
    
    Do Until rst1.EOF
        Qst = "INSERT INTO " & DestnTable & _
                " SELECT '" & _
                rst1.Fields("FirstName") & "' AS " & _
                "FirstName, '" & rst1.Fields("LastName") & _
                "' AS LastName, '" & rst1.Fields("Address") & _
                "' AS Address,"
        For Each fd In tdf.Fields
            Fnm = fd.Name
            If InStr(ExemptFields, Fnm) > 0 Then
            Else
                Qst2 = "SELECT " & Fnm & _
                        " FROM " & SourceTable & _
                        " WHERE FirstName = '" & _
                        rst1.Fields("FirstName") & _
                        "' AND LastName = '" & _
                        rst1.Fields("LastName") & _
                        "' AND Address = '" & _
                        rst1.Fields("Address") & _
                        "' AND Len(" & Fnm & ") > 0;"
                Set rst2 = db.OpenRecordset(Qst2)
                
                If rst2.RecordCount > 0 Then
                    Qst = Qst & " '" & rst2.Fields(0) & _
                            "' AS " & Fnm & ","
                Else
                    Qst = Qst & " Null AS " & Fnm & ","
                End If
                
            End If
        Next
        
        ' Remove trailing comma
        Qst = Left(Qst, Len(Qst) - 1)
        Qst = Qst & " FROM " & DummyTable & ";"
        
        ' Append to destination table
        db.Execute Qst, dbFailOnError
        
        rst1.MoveNext
    Loop
    
    rst1.Close
    rst2.Close
    Set rst1 = Nothing
    Set rst2 = Nothing
    Set fd = Nothing
    Set tdf = Nothing
    Set db = Nothing
End Sub
'=====================================

  ----- Original Message -----
  From: jwcolby
  To: 'Access Developers discussion and problem solving' 
  Sent: Wednesday, November 14, 2007 19:09
  Subject: Re: [AccessD] merging records


  A.D.,

  Thanks for the response.  Unfortunately it is not that simple, i.e. there
  are about 700 fields, of which about 600 are responses to query questions.
  Each of those 600 fields will need to be merged with the alternate record.
  for example:

  FName LName Addr Smokes Softdrink Car
  John  Colby 1723 N ''      ''
  John  Colby 1723  ''     Pepsi     ''
  John  Colby 1723  ''     ''        Ford Escort

  In at least one table there are 600 fields.  The fields are divided into
  "sets" of fields.  One set is about boats - State registered, length,
type,
  engine etc.  Another set is about medications taken - Zoloft, Aspirin,
Etc.
  Another set is about electronics purchased - stereo, cb, computers, cell
  phones etc.

  << SNIPPED to prevent overall size crossing limits >>

  John W. Colby
  Colby Consulting
  www.ColbyConsulting.com
--
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com




More information about the AccessD mailing list