[dba-VB] Merge rows

Charlotte Foust cfoust at infostatsystems.com
Tue Sep 22 09:51:00 CDT 2009


John,

I'm confused.  Why would you get a better result from one ugly record
with all the data in it than from querying for a master record with
child records, perhaps in a union query, that contained both answer 1
and answer 2?  Is the client wanting to query across polls, so that if
the respondent answers answer1 in one poll and answer2 in another, they
want to see that client?  It sounds like totally meaningless information
(not an uncommon request, in my experience).

Charlotte Foust 

-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Tuesday, September 22, 2009 7:00 AM
To: Discussion concerning Visual Basic and related programming issues.
Subject: Re: [dba-VB] Merge rows

Shamil,

 > OK, but in proposed solution there is no need to "string together 640
fields".

Yes you do.  See my post to Max for further explanation.  You might be
merging a half dozen records into one.  Each field of each record has to
be examined to see if there is data in it.  You can't simply update the
first record with the second if the first has data but the second
doesn't - you lose the data in the first.

 > Well, then you can create extension table for new fields.

And of course this is an option.  You have to remember that there will
be 50 million records.  When you start doing complex WHERE clauses to
and / or as many as a dozen fields to perform the select, things get
hairy.  Now you are adding in the join of two tables.  The sheer
quantity of records cause speed issues that you would not even have to
consider in situations with a thousand or even a million records.

John W. Colby
www.ColbyConsulting.com


Shamil Salakhetdinov wrote:
> Hi John,
> 
> OK, but in proposed solution there is no need to "string together 640 
> fields". Or do you mean that one static(/statically compiled) T-SQL 
> statement has some limitations? Then you can make a view with two 
> chars length of column names.
> 
> <<<
> the client wants to ADD more fields.
> Well, then you can create extension table for new fields.
> 
> BTW, you may try to use LINQ for SQL or ADO.NET Entity Framework to 
> query your database.
> 
> I'd be interested to see how LINQ for SQL or ADO.NET Entity Framework 
> handle such a table with 640 fields. Can you publish this table 
> structure plus say 100 records with cleaned-up or hashed name, 
> address, phone and other personal information?
> 
> Thank you.
> 
> --
> Shamil
> 
> P.S.
> 
> <<<
> This is truly a database from hell.
> Yes. Some people supposes that all the computing technology is coming 
> directly from the Hell - they promise a lot but they take even more 
> out - your life...
> 
> 
> -----Original Message-----
> From: dba-vb-bounces at databaseadvisors.com
> [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
> Sent: Tuesday, September 22, 2009 4:13 PM
> To: Discussion concerning Visual Basic and related programming issues.
> Subject: Re: [dba-VB] Merge rows
> 
> Shamil,
> 
> One of the problems I run into is the length of the string that can be

> manipulated in SQL Server.
> There is either a 4K or 8K limit (don't remember which) but if you try

> to string together 640 fields with long field names (and they are 
> loooong) you end up with an error message from SQL Server that the 
> string is too long.
> 
> This is truly a database from hell.
> 
> To make matters worse, the client wants to ADD more fields.
> 
> John W. Colby
> www.ColbyConsulting.com
> 
> 
> Shamil Salakhetdinov wrote:
>> Hi John,
>>
>> 3.086 million records seems to be manageable to be "deduped" in one 
>> go provided the source 50 million records  table is already prepared.
>>
>> For "deduping" you can use
>>
>> 1. One stored procedure which:
>>    a) will insert new individual attributes' record and
>>    b) will update not null values for existing individual attributes'
>> record.
>>
>> 2. One C#/VB.NET function to get records from the source 50 million
> records'
>> table and to "feed" the above stored procedure.
>>
>> That's a rough plan.
>>
>> Of course developing SP with 640 parameters and C#/VB.NET code 
>> working
> with
>> that SP is not an easy manual work - that SP and code could be
generated.
>>
>> Not sure yet here that SP can have 640 parameters therefore it may be
> needed
>> to have several SPs.
>>
>> To get source rows a T-SQL select expression with ROW_NUMBER() 
>> function should be used to have relatively small subsets of record 
>> batches to process. This selection can be done using multi-threaded
C#/VB.Net code .
> Do
>> you have any index over source 50 million rows table? - ROW_NUMBER() 
>> function will need such an index (I'm talking MS SQL 2005 here, maybe

>> in
> MS
>> SQL 2008 ROW_NUMBER() can be used without any indexes)...
>>
>> When source data deduped into ~3.086 million records table then that
> "small"
>> table can get a few the most selective columns' indexes, and probably

>> one compound natural key clustered index built using the most 
>> selective (/the most often used in queries) columns. If there are a 
>> few such columns at least one of which is used in all the queries 
>> then your application
> response
>> time should be finally measured in seconds... I hope...
>>
>> That's basically it.
>>
>> Please correct me if you suppose that the above plan can't be 
>> fulfilled because of some reasons I'm missing here.
>>
>> Thank you.
>>
>>
>> --
>> Shamil
>>  
>>
>> -----Original Message-----
>> From: dba-vb-bounces at databaseadvisors.com
>> [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
>> Sent: Tuesday, September 22, 2009 5:19 AM
>> To: Discussion concerning Visual Basic and related programming
issues.
>> Subject: Re: [dba-VB] Merge rows
>>
>> Not 50 million, though still a lot.  I just ran a query which shows 
>> 3.086 million individduals with multiple records.
>>
>> John W. Colby
>> www.ColbyConsulting.com
>>
>>
>> Max Wanadoo wrote:
>>> 50 Million!
>>>
>>> Max
>>>
>>>
>>> -----Original Message-----
>>> From: dba-vb-bounces at databaseadvisors.com
>>> [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Stuart
> McLachlan
>>> Sent: 21 September 2009 23:55
>>> To: Discussion concerning Visual Basic and related programming
issues.
>>> Subject: Re: [dba-VB] Merge rows
>>>
>>> On the assumption that you don't really care which record you get if
> there
>>> are conflicts, I'd
>>> build a new table from:
>>>
>>> Select Distinct, KeyField, Max(fld2),Max(fld3)..........
>>>
>  
> 
> _______________________________________________
> dba-VB mailing list
> dba-VB at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-vb
> http://www.databaseadvisors.com
> 
> 
_______________________________________________
dba-VB mailing list
dba-VB at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-vb
http://www.databaseadvisors.com





More information about the dba-VB mailing list