Jim Lawrence
accessd at shaw.ca
Sat Jul 21 21:14:25 CDT 2007
Hi John: Given: XML = unbound John = Bound Therefore: John <> XML As for the quantity of record handling capabilities of XML, Banks use XML... enough said. Jim -----Original Message----- From: dba-vb-bounces at databaseadvisors.com [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby Sent: Thursday, July 19, 2007 2:02 PM To: dba-vb at databaseadvisors.com Subject: Re: [dba-VB] How I'm approaching the problem My view of XML is that it just isn't viable for large data sets. These data sets contain 5 to 100 MILLION records, with 10 to 700 fields. Now think about XML where each field is wrapped with begin / end field name tags. Any given data table starts out at 300 megs of DATA. Now wrap that in 2 Gigs of XML trash... Now multiply by 100 files... I actually do end up parking the rejects, the client wants them for some reason. Eventually I will quietly delete them (they have never asked for me to use them in any way). In the end though the name / address stuff has to be processed separately. I cannot simply merge it back in because (remember the 600 other fields) it turns the inevitable table scan into a 24 hour experience. Also the original address may be valid and they just moved. Stuff like that. This is a HUGE process, although each individual piece is not so big. It is the sheer size of the data that makes it hard to manage. It turns out that the import into SQL server is time consuming but not tough once I bought a library to do that. At least the ones I have done so far are now easy. The lib pulls the data into arrays and processes chunks. I haven't seen the code but I suspect that it does X records at a time. The resulting tables are large. My biggest is 65 million records, 740 fields. My next biggest is 98 million records, 149 fields. In the end, the name / address table is the same size regardless of which raw table the data came from.