[dba-SQLServer] Bulk insert

Robert L. Stewart rl_stewart at highstream.net
Wed May 9 11:40:58 CDT 2007


See answers/comments below...

At 10:02 AM 5/9/2007, you wrote:
>Date: Wed, 9 May 2007 11:01:56 -0400
>From: "JWColby" <jwcolby at colbyconsulting.com>
>Subject: Re: [dba-SQLServer] Bulk insert
>To: <dba-sqlserver at databaseadvisors.com>
>Message-ID: <002501c7924b$01f69c10$657aa8c0 at m6805>
>Content-Type: text/plain;       charset="us-ascii"
>
>Robert,
>
>That is much appreciated.  Why don't you work with me instead?


We talked about working together on some things before. But nothing 
ever came of it.
The idea of the 'sample' was to give you an idea of how to do it so 
you could proceed
on your own.

I have a full time job as a SQL DBA and Developer now. Plus I am also doing
6 new .Net web sites and 2 new installs of my Social Services management (SSMS)
software this month. And, I a rewriting the SSMS system into .Net. And,
finally, to top all that off, my fiance and her daughter are having their
interviews for their visas to come here on June 18. So I have to get everything
ready for them also. So, a sample is the best I can do for you.

>   I am all for
>doing it "in SQL Server" (I own and use SQL Server 2005) but it has to be
>something that I understand and can modify or I will not be able to use it
>in the end, which would make your efforts wasted.

I don't consider doing samples a wasted effort, unless they are ignored.

>
>
>My high level spec:
>
>1) Use a "from directory" which is monitored to pick up files from.

I was not planning on using the SQL Agent and setting up job to
do it, but it could be done. A SQL Agent job would have to be
created for each 'client/job'.

>
>1a) The from directory changes for each import data set.

That should be part of the table definition for the 'client job' setup.

>2) Each file in the "from directory" will contain identical formatted data,
>but the number of records in the file may vary.

Would not matted if the format is the same.

>3) Each file will be defined by an import spec table, which contains Field
>Name, Field Width and data type.  If no data type is specified, then
>VarChar() is used.  The size of the VarChar() field is determined by the
>spec table.  This allows me to only have to spec data types (in advance)
>that I know is not VarChar(), but allows me to spec as many as I need in
>advance of the import.

This should be each file type, i.e. the format of the data.

You would build a SQL statement in code and use EXEC sp_ExecuteSql to
execute the statement to build the table from the information in the
import spec column definition table.

>4) Once the table is built and populated, a new field called PKID needs to
>be built.  PKID will be INT (32 bit) PRIMARY KEY, IDENTITY.

this is a simple ALTER TABLE statement.  But, it must also be built as
a string and executed as above.


>Understand that at this time I actually have a functioning system.  It takes
>a pair of tables which specify the file info (from / to directories, field
>delimiter etc) and the field info (field name / start position in the fixed
>width data / field length).  This program (written in VBA) does a conversion
>from fixed width to a pipe delimited "CSV" file, reading a line, stripping
>off the spaces, and writing the stripped data lines back out to a CSV file
>in the "TO Directory", complete with the first line containing field names.
>At the moment I have a hand constructed table in a hand constructed
>database, which is created by an initial use of the wizard from inside of
>SQL Server, pulling in the first CSV files created by my program.  Once that
>table is created, I use a hand created BULK INSERT Sproc to import the CSV
>files.  Once the table is fully populated with the contents of all the
>files, I hand build an ALTER TABLE query to build a PKID INT PRIMARY KEY,
>IDENTITY.
>
>As you can see, anywhere you see "hand created", that is an area that needs
>to be automated.  My thoughts are that creating the table initially will be
>relatively easy, and in fact I know how, building up a make table query with
>the field names and widths taken from the spec table.  I just did not do
>that because I did not have the time.  The next issue is dynamically
>creating the Sproc that does the Bulk Insert.
>
>Now, as to whether the process of importing the data (inside of SQL Server)
>strips off the spaces is really somewhat irrelevant at this point since I
>have working code to do this.  It is not blazing fast at about 1000 lines /
>second (for 150 fields) but it is "fast enough".  If I port that to VB.Net I
>hope / expect to get a speed increase.  The BULK INSERT SProc that I hand
>build is currently running about 12K records / sec (for 150 fields)
>
>In the end, this really needs to be an external application driving SQL
>Server functionality.  I need a place to go to fill in the import spec
>table, set the from / to directories, set up the name of the table etc.  My
>heartache to this point has been the inability to get the SQL Server
>built-in import stuff to import the data without the external strip /
>rebuild step, or at least without an elaborate dance to get around any
>limitations of SQL Server to do that stuff for me.

Correct, you would still need to have a front end to pass the parameters
into the SP to actually do the work.

>The very next thing I have to do is start exporting just the name / address
>(and PK) of this 100 million record table back out for CASS / NCOA
>processing.  This requires exporting 2 million records at a time, to a
>destination directory, with a unique file name (Infutor01.csv,
>Infutor02.csv...).  Once my CASS program finishes processing I will have a
>new set of files in yet another directory that I need to pull back in to SQL
>Server.  Those files will not require the space stripping piece since they
>will not be fixed width.

Writing them out will be relatively easy. Probably a record count per file
that would be a parameter you pass in to the SP that does it.


>I do appreciate all of the advice from all the folks out there that have
>contributed.  I am slowly but surely learning the pieces and parts that I
>need to do this part of my job.
>
>John W. Colby
>Colby Consulting
>www.ColbyConsulting.com





More information about the dba-SQLServer mailing list