[dba-VB] How I'm approaching the problem

Sat Jul 21 21:14:25 CDT 2007

Hi John:

Given:
XML = unbound
John = Bound

Therefore:
John <> XML

As for the quantity of record handling capabilities of XML, Banks use XML...
enough said.

Jim

-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Thursday, July 19, 2007 2:02 PM
To: dba-vb at databaseadvisors.com
Subject: Re: [dba-VB] How I'm approaching the problem

My view of XML is that it just isn't viable for large data sets.  These data
sets contain 5 to 100 MILLION records, with 10 to 700 fields.  Now think
about XML where each field is wrapped with begin / end field name tags.  Any
given data table starts out at 300 megs of DATA.  Now wrap that in 2 Gigs of
XML trash...  Now multiply by 100 files...

I actually do end up parking the rejects, the client wants them for some
reason.  Eventually I will quietly delete them (they have never asked for me
to use them in any way).

In the end though the name / address stuff has to be processed separately.
I cannot simply merge it back in because (remember the 600 other fields) it
turns the inevitable table scan into a 24 hour experience.  Also the
original address may be valid and they just moved.  Stuff like that.

This is a HUGE process, although each individual piece is not so big.  It is
the sheer size of the data that makes it hard to manage.

It turns out that the import into SQL server is time consuming but not tough
once I bought a library to do that.  At least the ones I have done so far
are now easy.  The lib pulls the data into arrays and processes chunks.  I
haven't seen the code but I suspect that it does X records at a time.  The
resulting tables are large.  My biggest is 65 million records, 740 fields.
My next biggest is 98 million records, 149 fields.  In the end, the name /
address table is the same size regardless of which raw table the data came
from.