JWColby
jwcolby at colbyconsulting.com
Tue May 8 09:09:34 CDT 2007
Eric, Because... There are FIFTY SIX FILES, ONE HUNDRED AND FIFTY fields in this specific data set. There were SEVEN HUNDRED fields in the last data set. It sounds easy to "just insert into a temporary table and then move to the real table" but part of that process involves installing a strip() around each field. I am not a SQL kind of guy and no one here is offering to provide me with code "inside of SQL Server) that GENERICALLY goes out and gets the list of fields, builds up dynamically "in code" a sql statement that inserts that "strip()" around each field appending to the destination table of course, and then executes that. I am in the business of taking files and converting them. I am given the data files, but I am also given a spreadsheet (or occasionally just a printout) containing the field names, positions in the fixed width string and the width of each field. I have to get that data into a table, then use THAT data to parse the fixed width string. This is NOT a once off thing. If I am going to do this, it has to work tomorrow, with a completely different data set, different directory on the disk where the data is stored, different table name, different field name, different number of fields. I can do that in VBA, someday soon I will be able to do that in VB.Net. I cannot do this inside of SQL Server. Go listen to the "Agile programming" discussion on the AccessD list and see my position. I am a little bit frustrated because I laid out the entire problem back in the beginning. This has to be a SYSTEM, that processes, from start to finish, a SET of HUGE fixed width data files. "A bulk insert process that involved importing the data (with spaces) into a temporary file and then inserting those records into the actual table" is not a SYSTEM, it is one small part of the whole problem. That method would work just fine ONCE, on ONE FILE. It quickly turns into hundreds of manual edits to create the original table, create the temp table, create the SQL statement to move from the temp to the real table, to insert file names into the query that pulls the data from the text files into the temp table etc. AND THEN, DO IT ALL OVER AGAIN for the next set of files. I CANNOT DO THAT. Remember, I am OLD. I will die of old age before I can finish my job. Not to mention I have other clients as well expecting some small part of my time. AND I have kids who also want to see me. It took me a few hours to write a set of classes to do the data conversion to strip the files. I have a table that holds the location of the source and converted data files as well as statistics about how long etc. The client gives me a "spec" which I get into a table to tell me where each field starts and its length and its name. Code picks it up from there. Open a file, strip the spaces, line by line, write to a different file in a different directory, log the progress. ATM it does NOT do so but soon the code will then pick up the converted file and execute a Sproc that bulk inserts that converted file into the destination table. Once that is done I will have a SYSTME where I build a pair of tables, from / to dir and the field defs for an import. Click a button and come back in a few hours to have 100 million records from 56 files stripped and inserted into a table in SQL Server, progress logged into a table, file name processed, start time to strip, finish time to strip, number of records in file, start time to bulk insert, end time to bulk insert (maybe), number of records bulk inserted. The next set of files... I place in a new directory, build records in the two tables, push a button and come back in a few hours to 150 million records stripped and inserted. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Eric Barro Sent: Tuesday, May 08, 2007 9:33 AM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Bulk insert John, Why not try the method I proposed with the bulk insert process that involved importing the data (with spaces) into a temporary file and then inserting those records into the actual table? That would be the "one fell swoop" method you're looking for that SQL server handles really well. This is the same method I used to grab data from Peoplesoft to import into an Employees table in SQL server. Eric -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of JWColby Sent: Tuesday, May 08, 2007 6:25 AM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Bulk insert Billy, The issue is not "handling fixed width" but in stripping off the trailing spaces in the process. The environment that I work in isn't "import a file and use it", but rather "import 56 files containing a HUNDRED MILLION records of 150 (or SEVEN HUNDRED) fields into a single table and use it". The SPACES in this last file were more than 40% of the total volume of the file. I CANNOT import all the spaces and then go back and strip them off in situ. The extra space in the database and the extra time to do the strip makes that a non starter. And this is where (so far) the bulk insert method has failed me. Nobody has shown me how to strip the spaces on the way in, and still keep the time up. I have written an external application, in Access / VBA but headed towards VB.NET which opens these HUGE (often four GIGABYTE) files (remember there were 56 of these files in the last batch), and then reads each line, pulls each field out, trims off the spaces, assembles it into a string with a field delimiter, and writes each line back to a new file. That new file imports cleanly into SQL Server using a BULK INSERT sql statement embedded in a sproc, which I am confident I will get working from my same external application to allow me to have that one app strip and import into SQL Server, logging all results. SQL Server may very well be able to do all this stuff in one fell swoop, who knows? Certainly not me. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Billy Pang Sent: Tuesday, May 08, 2007 1:11 AM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Bulk insert John: This is a bit late so not sure if you resolved your issues with BCP yet. However, I would like to point out that bcp does indeed handle fixed width datatypes. See following illustration: 1) create a database called test 2) run the following script to create a table called table1 CREATE TABLE [Table1] ( [asdf] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [asdf2] [int] NULL , [asdf3] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [asdf4] [int] NULL , [asf] [int] IDENTITY (1, 1) NOT NULL , CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED ( [asf] ) ON [PRIMARY] ) ON [PRIMARY] 3) populate the table with 2 records insert into table1 values('1',2,'3',4) insert into table1 values('a',5,'b',6) 4) ok.. now use bcp to export the data using trusted connection bcp "test..table1" out "c:\yo_exported_data.txt" -T -c there are two records in the data file. note: fixed width preserved! 5) use bcp to create a format file bcp "test..table1" format -T -c -f "c:\yo_format_file.txt" 6) now use bcp to import the records you just exported using the format file you just created bcp "test..table1" in "c:\yo_exported_data.txt" -T -c -f "c:\yo_format_file.txt" 7) count the records in the table select count(*) from table1 there are four records 8) view the records in the tables. select * from table1 sometimes bcp is a bit of pita but it works like a dream once you get it working. hardest part is creating the format file but once you get bcp to do that for you, the rest pretty much writes itself. HTH Billy On 5/1/07, David Lewis <DavidL at sierranevada.com> wrote: > > > > John: > > There are a few websites concerned with sql server, with active forums. > I recommend you check them out. One is sqlcentral.com, the other is > sswug. > > You are right, access is a 'toy' when compared to sql server. One > problem you are having is that you don't have the luxury of a learning > curve -- but that is not the fault of the tool. Hang in there. D > > > And that is exactly what I am doing and it is soooooo KLUDGY!!! > > I am running a program I wrote to read the data out of the original > fixed width file because SQL Server won't handle fixed width files and > strip off the spaces. How incredibly stupid is THAT? Is there ANYONE > out there who WANTS those spaces? So I am already using KLUDGE to get > data into SQL Server. Now I export it out to a perfectly valid CSV > file only to discover that SQL Server BCP and Bulk Insert don't even > look at (understand) quotes around comma delimited fields. > > But ACCESS does. But Access is a TOY remember? What exactly does > that make SQL Server that it needs a toy to feed it data? > > This has been an exercise in discovering just how brain dead the data > import processes are (or can be anyway) for SQL Server. This is NOT > rocket science. I am able to write a utility to open / import / > mangle / export it back out to another file in VBA. How tough can it > be to do this import inside of SQL Server natively? > > I have no idea how widespread this kind of file is but I can tell you > that that is all I see EVER in the industry I am dealing with. HUGE > files, fixed width, space padded right. And I can tell you they have > been a royal PITA to get into SQL Server. > > At least now I have my own utility that can get I these input files > into the format I need, even if it is in ACCESS/ VBA. My next step is > to port this to VB.Net so that I can do it a little more "natively". > Once I get a little more familiar with VB.Net I want to look at > storing the data right into a recordset in ADO and then write that > back to SQL Server. If that is too slow (I suspect that it will be) > then I can still do what I do now and import / mangle / write to file > and then run a stored procedure to do a Bulk Insert from the file I > create. > > John W. Colby > Colby Consulting > www.ColbyConsulting.com > > > _______________________________________________ > dba-SQLServer mailing list > dba-SQLServer at databaseadvisors.com > http://databaseadvisors.com/mailman/listinfo/dba-sqlserver > http://www.databaseadvisors.com > > -- Billy Pang http://dbnotes.blogspot.com/ "Once the game is over, the King and the pawn go back in the same box." - Italian proverb _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.467 / Virus Database: 269.6.5/793 - Release Date: 5/7/2007 2:55 PM _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com