JWColby
jwcolby at colbyconsulting.com
Wed May 9 15:59:37 CDT 2007
Whadayathinkimstupid? However... Once I am done with my import widget I will do these fixed price, and make a killing. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of William Hindman Sent: Wednesday, May 09, 2007 4:51 PM To: Access Developers discussion and problem solving Subject: Re: [AccessD] Infutor Statistics - was RE: [dba-SQLServer] Bulkinsert ...go ahead ...tell us you did this fixed price :) William Hindman ----- Original Message ----- From: "JWColby" <jwcolby at colbyconsulting.com> To: <dba-sqlserver at databaseadvisors.com>; "'Access Developers discussion and problem solving'" <accessd at databaseadvisors.com> Sent: Wednesday, May 09, 2007 1:01 PM Subject: [AccessD] Infutor Statistics - was RE: [dba-SQLServer] Bulk insert > Just an FYI. The table that I have been building this whole time > contains > 97.5 million records, exactly 149 (imported) fields and requires 62.6 > Gigabytes of data space inside of SQL Server. It took 2 hours and 28 > minutes just to build the auto increment PK field after the table was > finished importing records. The index space for the table (with just > this single > index) is 101 Megabytes. > > There were 56 raw data files which required 75 gigabytes of disk space > to hold. There were 56 CSV files created after stripping out the > spaces, which required 40.8 Gigabytes of disk space to hold. Thus by > my calculations, > 35 > gigs of disk space was required to hold JUST THE SPACES in the > original fixed width file, with the real data occupying 40.8 GB. It > is interesting to note that the raw data in the CSV file was 41gb > while the data space required in SQL Server is 62 gb. > > As the process was built over time, I do not have accurate specs for > each and every file, but the process of stripping the spaces off of > the fields happened at about 1K records / second. Given 97.5 million > records, this equates to 97.5 thousand seconds to do the space > stripping, which is 27.77 hours. That of course is done in a VBA > application. Again I don't have accurate specs for all of the bulk > inserts, however those that I recorded the times for summed to 71.2 > million records, which took 4674 seconds (1.3 > hours) to import using a BULK INSERT statement, which equates to > approximately 15K records / second. Remember that this BULK INSERT is > importing precleaned data with pipe delimiters. Also remember that > the BULK INSERT itself took 1.3 hours but due to the lack of > automation in feeding the Sproc file names, I had to manually edit the > SPROC each time I wanted to import a new file so the actual import > took much longer, since I wasn't necessarily watching the computer as > the last SPROC run finished. > > So there you go, that is what I have been trying to accomplish this > last few weeks. > > John W. Colby > Colby Consulting > www.ColbyConsulting.com > > -----Original Message----- > From: dba-sqlserver-bounces at databaseadvisors.com > [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of > JWColby > Sent: Wednesday, May 09, 2007 11:02 AM > To: dba-sqlserver at databaseadvisors.com > Subject: Re: [dba-SQLServer] Bulk insert > > Robert, > > That is much appreciated. Why don't you work with me instead? I am > all for doing it "in SQL Server" (I own and use SQL Server 2005) but > it has to be something that I understand and can modify or I will not > be able to use it in the end, which would make your efforts wasted. > > My high level spec: > > 1) Use a "from directory" which is monitored to pick up files from. > 1a) The from directory changes for each import data set. > 2) Each file in the "from directory" will contain identical formatted > data, but the number of records in the file may vary. > 3) Each file will be defined by an import spec table, which contains > Field Name, Field Width and data type. If no data type is specified, > then > VarChar() is used. The size of the VarChar() field is determined by > the spec table. This allows me to only have to spec data types (in > advance) that I know is not VarChar(), but allows me to spec as many > as I need in advance of the import. > 4) Once the table is built and populated, a new field called PKID > needs to be built. PKID will be INT (32 bit) PRIMARY KEY, IDENTITY. > > Understand that at this time I actually have a functioning system. It > takes a pair of tables which specify the file info (from / to > directories, field delimiter etc) and the field info (field name / > start position in the fixed width data / field length). This program > (written in VBA) does a conversion from fixed width to a pipe > delimited "CSV" file, reading a line, stripping off the spaces, and > writing the stripped data lines back out to a CSV file in the "TO > Directory", complete with the first line containing field names. > At the moment I have a hand constructed table in a hand constructed > database, which is created by an initial use of the wizard from inside > of SQL Server, pulling in the first CSV files created by my program. > Once that table is created, I use a hand created BULK INSERT Sproc to > import the CSV files. Once the table is fully populated with the > contents of all the files, I hand build an ALTER TABLE query to build > a PKID INT PRIMARY KEY, IDENTITY. > > As you can see, anywhere you see "hand created", that is an area that > needs to be automated. My thoughts are that creating the table > initially will be relatively easy, and in fact I know how, building up > a make table query with the field names and widths taken from the spec > table. I just did not do that because I did not have the time. The > next issue is dynamically creating the Sproc that does the Bulk > Insert. > > Now, as to whether the process of importing the data (inside of SQL > Server) > strips off the spaces is really somewhat irrelevant at this point > since I have working code to do this. It is not blazing fast at about > 1000 lines / second (for 150 fields) but it is "fast enough". If I > port that to VB.Net I hope / expect to get a speed increase. The BULK > INSERT SProc that I hand build is currently running about 12K records > / sec (for 150 fields) > > In the end, this really needs to be an external application driving > SQL Server functionality. I need a place to go to fill in the import > spec table, set the from / to directories, set up the name of the table etc. > My > heartache to this point has been the inability to get the SQL Server > built-in import stuff to import the data without the external strip / > rebuild step, or at least without an elaborate dance to get around any > limitations of SQL Server to do that stuff for me. > > The very next thing I have to do is start exporting just the name / > address (and PK) of this 100 million record table back out for CASS / > NCOA processing. This requires exporting 2 million records at a time, > to a destination directory, with a unique file name (Infutor01.csv, > Infutor02.csv...). Once my CASS program finishes processing I will > have a new set of files in yet another directory that I need to pull > back in to SQL Server. Those files will not require the space > stripping piece since they will not be fixed width. > > I do appreciate all of the advice from all the folks out there that > have contributed. I am slowly but surely learning the pieces and > parts that I need to do this part of my job. > > John W. Colby > Colby Consulting > www.ColbyConsulting.com > > -----Original Message----- > From: dba-sqlserver-bounces at databaseadvisors.com > [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Robert L. > Stewart > Sent: Wednesday, May 09, 2007 9:31 AM > To: dba-sqlserver at databaseadvisors.com > Subject: Re: [dba-SQLServer] Bulk insert > > John, > > I am working on an example of doing this with SQL Server for you. > But, since I have the same amount of free time as you do, it is going > to take a week or so to complete. I am using the same concept as you > did with a couple of table to hold the Import spec and the column > definitions for the spec. I am going to only build it for one table to > show how it can be done. > The rest will be up to you if you want to expand it. > > Robert > > At 09:27 PM 5/8/2007, you wrote: >>Date: Tue, 8 May 2007 13:40:59 -0400 >>From: "JWColby" <jwcolby at colbyconsulting.com> >>Subject: Re: [dba-SQLServer] Bulk insert >>To: <dba-sqlserver at databaseadvisors.com> >>Message-ID: <00d901c79198$0a703210$657aa8c0 at m6805> >>Content-Type: text/plain; charset="us-ascii" >> >>Jim, >> >>At this point it is not necessary. I built a preprocessor in a few >>hours using my toy (Access). My toy application handles everything >>exactly as described. Someday (soon I hope) I will port that to >>VB.Net which I hope will be much quicker in the preprocessing department. >>Then I will be considered by some as being a real man, playing with >>real tools. ;-) Others will still consider me a child, playing with >>toys because I didn't take it straight to C#. SOMEDAY (far in the >>future) perhaps I will embed those pieces directly in CLR programming >>inside of SQL Server 2005. Then I will be able to look down my nose >>at > those children still playing with toys. >> >>For now, it works and with the addition of driving the Sproc from the >>vba will be an integrated application like what I described. >> >>John W. Colby > > > _______________________________________________ > dba-SQLServer mailing list > dba-SQLServer at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-sqlserver > http://www.databaseadvisors.com > > _______________________________________________ > dba-SQLServer mailing list > dba-SQLServer at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-sqlserver > http://www.databaseadvisors.com > > -- > AccessD mailing list > AccessD at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/accessd > Website: http://www.databaseadvisors.com > -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com