Jim Lawrence
accessd at shaw.ca
Sun Jul 22 22:01:21 CDT 2007
Hi John: You have taken on a massively complex project single-handedly. When I was working full-time for a company and a similar sized project appeared I assign at least 2 people to the project. It seems that 2 people can do the work of three when they work together. MS SQL people tend to think their a little better than the standard Access grunts. Why that is so I have no idea. Considering that MS SQL developers have the luxury of working with a faster and better product that is much easier to obtain positive results than from an equally complex project written totally in Access. That is why I write most of my new apps in a combination of Access FE and MS SQL BE because I get the best of all worlds. MS SQL is more rugged than the MDB, handles unbound connections without the absolute need for a complex locking scheme as MS SQL is designed to work it this type of environment. It internally handles locking, multi-access to a single record or group of records. It is a professional level DB and is actually easier to work with. Unfortunately, ADO is the best connection protocol for performance and reliability but if you do not know it, it is just another major item to learn. If we throw learn .Net from scratch into the mix and you have to hold on with both hands just to keep your sanity. I am amazed at how far you have come in such a short time. Nothing like a baptism in fire... If you are a little stressed, it is to be expected. Hope your day has gone well. Regards Jim -----Original Message----- From: dba-vb-bounces at databaseadvisors.com [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby Sent: Sunday, July 22, 2007 6:54 PM To: dba-vb at databaseadvisors.com Subject: Re: [dba-VB] How I'm approaching the problem ;-) Sorry if I snapped buddy. This whole system is just a tad overwhelming. There are soooo many pieces and steps and things to do. I am writing a system in VB.Net to automate the process, where on a form I can specify the server, name of a new database and table, and a directory where the files are stored and the software will do the import from all these files into SQL Server. I am building another piece that exports a table (or fields in a table) out to a set of files in a directory, kind of the inverse of the first piece. By running those two pieces in order, I can import, export, address validate, re-import all in one operation. The I can also schedule the export / address validate / import on a periodic basis, with luck completely automated. All of this has to have process logging so that if anything fails I can go see what failed, and where in the process. It also has to do logging to my billing database so that all this stuff gets billed to my client automatically, whenever any piece of the process runs. I am perhaps overly sensitive for a variety of reasons starting with the fact that I have gotten a lot of flack on the SQL Server list about not understanding enough SQL Server to do this stuff (true, but when has that ever stopped me), how the wizards are toys meant for beginners and my needs far exceed their capabilities (also true) etc. I am struggling with learning two entire new systems - SQL Server and VB.Net / ADO.Net AND doing it on hardware / software that truly is inadequate (or barely adequate) for the task. These databases are HUGE by any datasets I have ever encountered in the past. I am accustomed to doing systems with hundreds of tables but under a million records in the largest table. Here it is a handful of tables but tens of millions of records in each one. Desktop machines with 32 bit OS / Sql Server just don't cut it. On the bright side the quad core machines are out and a price war is on. The price of memory is dropping like a rock, and I can now build a dual processor 8 core system with 32 and up to 64 gb of ram for a "reasonable" price. It appears that I will be doing so before the end of the year. I found 64 bit SQL Server at a price I could afford and now if I can get a copy of Windows 2003 x64 at a price I can afford (and get it to install and run - drivers are still an issue) I should finally have a SQL Server system that will have the oomph to handle my data. I am a one man show, trying to do a pretty huge job (in my universe anyway) and I am a little stressed. But things are finally coming together. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-vb-bounces at databaseadvisors.com [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Jim Lawrence Sent: Sunday, July 22, 2007 6:42 PM To: dba-vb at databaseadvisors.com Subject: Re: [dba-VB] How I'm approaching the problem Hi John: Just a note; I have been working on bank hardware and software installation for about two years and have worked with and in conversation with many of the senior software techs, when updating their systems. And yes they do use XML almost exclusively for transferring data from Cashers stations, desktops and bank machines to their centralized storage and banking systems. It is so that regardless of what type of OS or hardware being used information can be transferred seamlessly back and forth. The transport and/or VPN are all heavy encrypted as you would expect. I am sorry you felt insulted by the comments but it was totally meant in a fun way and not in any way designed as a disparaging remark. At worse it was supposed to bring a smile... but I was hoping you would get a laugh out of it. I must definitely be more careful in the future when a comment could potentially be taken in more than one way. I think in your case it would not be of any advance to use XML for data manipulation as it would just add an extra layer of complexity and for what reason. Jim -----Original Message----- From: dba-vb-bounces at databaseadvisors.com [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby Sent: Saturday, July 21, 2007 9:51 PM To: dba-vb at databaseadvisors.com Subject: Re: [dba-VB] How I'm approaching the problem LOL. I have been doing this stuff in SQL Server and VB.Net for about a year now and so far have not even used a form except to have a button to push. So much for bound. Bound is only applicable when the data is being entered / displayed. Mine is not. When you deal with 90 million records you don't "view it" (though I can if I want to, it is in SQL Server). >As for the quantity of record handling capabilities of XML, Banks use XML... XML is designed for pushing data around between systems. I would be very surprised if banks did much if any data storage or manipulation in XML. Banks have many different systems, none of which talk to each other. XML is a widely recognized standard, designed for pushing data around between systems that otherwise don't talk well, and is a natural fit (for that). Banks also have entire departments with monthly budgets for servers, people and entire buildings to house said departments which could buy my entire house (in CASH) every week. I am one person, with one client using this stuff. I run all my stuff on dual core desktop machines. If you think this is not an issue let me put it this way. AN ACTUAL DATA FILE FROM A CUSTOMER: ACTUAL FIELD NAMES - Data size (characters) FIPS CODE - 5 FIPS SUB CODE 3 MUNICIPALITY CODE 3 APN UNFORMATTED 13 APN FORMATTED 13 MORTGAGE INTEREST RATE TYPE CODE 3 There are NINETY columns with names like this. Now, you can see that the data is MINISCULE compared to the size of the field names. NOW, turn that into XML. Now store NINETY MILLION RECORDS LIKE THAT. Now MANIPULATE NINETY MILLION XML RECORDS like that in a desktop machine with 4 gig of ram. >As for the quantity of record handling capabilities of XML, Banks use XML... That is an incredibly ignorant statement. enough said. I actually use XML as we speak, to store process log data out to files on my machine. It is quick, easy and standard. But it is tiny amounts of data, perhaps a Kbyte per file processed. John <> XML Not true. I just choose not to take 70 gigabytes of raw data in a nice neat | delimited CSV file and turn it into 700 gigabytes of XML and then try | to process that in 2 gigabytes of process memory on a desktop machine. THAT sounds like a recipe for tying up my server for the next... Oh... Two weeks. It takes me roughly 90 seconds per file to import each CSV file into a SQL Server database. I have to get it in to SQL Server, that is what my client hires me to do. The fact that I export the name / address back out for address processing immediately is irrelevant, I will do so every month for the rest of eternity. Why on god's green earth would I screw around turning this into a MONSTER XML file (or files) just so I could preprocess a handful of fields? John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-vb-bounces at databaseadvisors.com [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Jim Lawrence Sent: Saturday, July 21, 2007 10:14 PM To: dba-vb at databaseadvisors.com Subject: Re: [dba-VB] How I'm approaching the problem Hi John: Given: XML = unbound John = Bound Therefore: John <> XML As for the quantity of record handling capabilities of XML, Banks use XML... enough said. Jim -----Original Message----- From: dba-vb-bounces at databaseadvisors.com [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of jwcolby Sent: Thursday, July 19, 2007 2:02 PM To: dba-vb at databaseadvisors.com Subject: Re: [dba-VB] How I'm approaching the problem My view of XML is that it just isn't viable for large data sets. These data sets contain 5 to 100 MILLION records, with 10 to 700 fields. Now think about XML where each field is wrapped with begin / end field name tags. Any given data table starts out at 300 megs of DATA. Now wrap that in 2 Gigs of XML trash... Now multiply by 100 files... I actually do end up parking the rejects, the client wants them for some reason. Eventually I will quietly delete them (they have never asked for me to use them in any way). In the end though the name / address stuff has to be processed separately. I cannot simply merge it back in because (remember the 600 other fields) it turns the inevitable table scan into a 24 hour experience. Also the original address may be valid and they just moved. Stuff like that. This is a HUGE process, although each individual piece is not so big. It is the sheer size of the data that makes it hard to manage. It turns out that the import into SQL server is time consuming but not tough once I bought a library to do that. At least the ones I have done so far are now easy. The lib pulls the data into arrays and processes chunks. I haven't seen the code but I suspect that it does X records at a time. The resulting tables are large. My biggest is 65 million records, 740 fields. My next biggest is 98 million records, 149 fields. In the end, the name / address table is the same size regardless of which raw table the data came from. _______________________________________________ dba-VB mailing list dba-VB at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-vb http://www.databaseadvisors.com _______________________________________________ dba-VB mailing list dba-VB at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-vb http://www.databaseadvisors.com _______________________________________________ dba-VB mailing list dba-VB at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-vb http://www.databaseadvisors.com