[dba-SQLServer] Bulk insert

JWColby jwcolby at colbyconsulting.com
Tue May 8 09:09:34 CDT 2007


Eric,

Because...

There are FIFTY SIX FILES, ONE HUNDRED AND FIFTY fields in this specific
data set.  There were SEVEN HUNDRED fields in the last data set.  It sounds
easy to "just insert into a temporary table and then move to the real table"
but part of that process involves installing a strip() around each field.  I
am not a SQL kind of guy and no one here is offering to provide me with code
"inside of SQL Server) that GENERICALLY goes out and gets the list of
fields, builds up dynamically "in code" a sql statement that inserts that
"strip()" around each field appending to the destination table of course,
and then executes that.  

I am in the business of taking files and converting them.  I am given the
data files, but I am also given a spreadsheet (or occasionally just a
printout) containing the field names, positions in the fixed width string
and the width of each field.  I have to get that data into a table, then use
THAT data to parse the fixed width string.

This is NOT a once off thing.  If I am going to do this, it has to work
tomorrow, with a completely different data set, different directory on the
disk where the data is stored, different table name, different field name,
different number of fields.  I can do that in VBA, someday soon I will be
able to do that in VB.Net.  I cannot do this inside of SQL Server.

Go listen to the "Agile programming" discussion on the AccessD list and see
my position.

I am a little bit frustrated because I laid out the entire problem back in
the beginning.  This has to be a SYSTEM, that processes, from start to
finish, a SET of HUGE fixed width data files.  "A bulk insert process that
involved importing the data (with spaces) into a temporary file and then
inserting those records into the actual table" is not a SYSTEM, it is one
small part of the whole problem.  That method would work just fine ONCE, on
ONE FILE.  It quickly turns into hundreds of manual edits to create the
original table, create the temp table, create the SQL statement to move from
the temp to the real table, to insert file names into the query that pulls
the data from the text files into the temp table etc.  AND THEN, DO IT ALL
OVER AGAIN for the next set of files.  I CANNOT DO THAT.  Remember, I am
OLD.  I will die of old age before I can finish my job.  Not to mention I
have other clients as well expecting some small part of my time.  AND I have
kids who also want to see me.

It took me a few hours to write a set of classes to do the data conversion
to strip the files.  I have a table that holds the location of the source
and converted data files as well as statistics about how long etc.  The
client gives me a "spec" which I get into a table to tell me where each
field starts and its length and its name.  Code picks it up from there.
Open a file, strip the spaces, line by line, write to a different file in a
different directory, log the progress.  ATM it does NOT do so but soon the
code will then pick up the converted file and execute a Sproc that bulk
inserts that converted file into the destination table.  Once that is done I
will have a SYSTME where I build a pair of tables, from / to dir and the
field defs for an import.  Click a button and come back in a few hours to
have 100 million records from 56 files stripped and inserted into a table in
SQL Server, progress logged into a table, file name processed, start time to
strip, finish time to strip, number of records in file, start time to bulk
insert, end time to bulk insert (maybe), number of records bulk inserted.

The next set of files... I place in a new directory, build records in the
two tables, push a button and come back in a few hours to 150 million
records stripped and inserted.

John W. Colby
Colby Consulting
www.ColbyConsulting.com

-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Eric Barro
Sent: Tuesday, May 08, 2007 9:33 AM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] Bulk insert

John,

Why not try the method I proposed with the bulk insert process that involved
importing the data (with spaces) into a temporary file and then inserting
those records into the actual table? That would be the "one fell swoop"
method you're looking for that SQL server handles really well. This is the
same method I used to grab data from Peoplesoft to import into an Employees
table in SQL server.

Eric 

-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of JWColby
Sent: Tuesday, May 08, 2007 6:25 AM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] Bulk insert

Billy, 

The issue is not "handling fixed width" but in stripping off the trailing
spaces in the process.  The environment that I work in isn't "import a file
and use it", but rather "import 56 files containing a HUNDRED MILLION
records of 150 (or SEVEN HUNDRED) fields into a single table and use it".
The SPACES in this last file were more than 40% of the total volume of the
file.

I CANNOT import all the spaces and then go back and strip them off in situ.
The extra space in the database and the extra time to do the strip makes
that a non starter.  

And this is where (so far) the bulk insert method has failed me.  Nobody has
shown me how to strip the spaces on the way in, and still keep the time up.

I have written an external application, in Access / VBA but headed towards
VB.NET which opens these HUGE (often four GIGABYTE) files (remember there
were 56 of these files in the last batch), and then reads each line, pulls
each field out, trims off the spaces, assembles it into a string with a
field delimiter, and writes each line back to a new file.  That new file
imports cleanly into SQL Server using a BULK INSERT sql statement embedded
in a sproc, which I am confident I will get working from my same external
application to allow me to have that one app strip and import into SQL
Server, logging all results.  

SQL Server may very well be able to do all this stuff in one fell swoop, who
knows?  Certainly not me.

John W. Colby
Colby Consulting
www.ColbyConsulting.com

-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of Billy Pang
Sent: Tuesday, May 08, 2007 1:11 AM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] Bulk insert

John:

This is a bit late so not sure if you resolved your issues with BCP yet.
However, I would like to point out that bcp does indeed handle fixed width
datatypes.  See following illustration:

1) create a database called test

2) run the following script to create a table called table1

CREATE TABLE [Table1] (
 [asdf] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,  [asdf2]
[int] NULL ,  [asdf3] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,  [asdf4] [int] NULL ,  [asf] [int] IDENTITY (1, 1) NOT NULL ,  CONSTRAINT
[PK_Table1] PRIMARY KEY  CLUSTERED  (
  [asf]
 )  ON [PRIMARY]
) ON [PRIMARY]

3) populate the table with 2 records

insert into table1 values('1',2,'3',4)
insert into table1 values('a',5,'b',6)

4) ok.. now use bcp to export the data using trusted connection

bcp "test..table1" out "c:\yo_exported_data.txt" -T -c

there are two records in the data file.  note: fixed width preserved!

5) use bcp to create a format file

bcp "test..table1" format -T -c -f "c:\yo_format_file.txt"

6) now use bcp to import the records you just exported using the format file
you just created

bcp "test..table1" in "c:\yo_exported_data.txt" -T -c -f
"c:\yo_format_file.txt"

7) count the records in the table
select count(*) from table1

there are four records

8) view the records in the tables.

select * from table1

sometimes bcp is a bit of pita but it works like a dream once you get it
working.  hardest part is creating the format file but once you get bcp to
do that for you, the rest pretty much writes itself.

HTH

Billy




On 5/1/07, David Lewis <DavidL at sierranevada.com> wrote:
>
>
>
> John:
>
> There are a few websites concerned with sql server, with active forums.
> I recommend you check them out.  One is sqlcentral.com, the other is 
> sswug.
>
> You are right, access is a 'toy' when compared to sql server.  One 
> problem you are having is that you don't have the luxury of a learning 
> curve -- but that is not the fault of the tool.  Hang in there.  D
>
>
> And that is exactly what I am doing and it is soooooo KLUDGY!!!
>
> I am running a program I wrote to read the data out of the original 
> fixed width file because SQL Server won't handle fixed width files and 
> strip off the spaces.  How incredibly stupid is THAT?  Is there ANYONE 
> out there who WANTS those spaces?  So I am already using KLUDGE to get 
> data into SQL Server.  Now I export it out to a perfectly valid CSV 
> file only to discover that SQL Server BCP and Bulk Insert don't even 
> look at (understand) quotes around comma delimited fields.
>
> But ACCESS does.  But Access is a TOY remember?  What exactly does 
> that make SQL Server that it needs a toy to feed it data?
>
> This has been an exercise in discovering just how brain dead the data 
> import processes are (or can be anyway) for SQL Server.  This is NOT 
> rocket science.  I am able to write a utility to open / import / 
> mangle / export it back out to another file in VBA.  How tough can it 
> be to do this import inside of SQL Server natively?
>
> I have no idea how widespread this kind of file is but I can tell you 
> that that is all I see EVER in the industry I am dealing with.  HUGE 
> files, fixed width, space padded right.  And I can tell you they have 
> been a royal PITA to get into SQL Server.
>
> At least now I have my own utility that can get I these input files 
> into the format I need, even if it is in ACCESS/ VBA.  My next step is 
> to port this to VB.Net so that I can do it a little more "natively".
> Once I get a little more familiar with VB.Net I want to look at 
> storing the data right into a recordset in ADO and then write that 
> back to SQL Server.  If that is too slow (I suspect that it will be) 
> then I can still do what I do now and import / mangle / write to file 
> and then run a stored procedure to do a Bulk Insert from the file I 
> create.
>
> John W. Colby
> Colby Consulting
> www.ColbyConsulting.com
>
>
> _______________________________________________
> dba-SQLServer mailing list
> dba-SQLServer at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
> http://www.databaseadvisors.com
>
>


--
Billy Pang
http://dbnotes.blogspot.com/
"Once the game is over, the King and the pawn go back in the same box." -
Italian proverb _______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com

No virus found in this incoming message.
Checked by AVG Free Edition. 
Version: 7.5.467 / Virus Database: 269.6.5/793 - Release Date: 5/7/2007 2:55
PM
 

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com




More information about the dba-SQLServer mailing list