[dba-SQLServer] Bulk insert

JWColby jwcolby at colbyconsulting.com
Wed May 9 13:33:49 CDT 2007


Marty,

My only question about this method is the loading of all the records at one
go before processing.  Remember that I am doing raw files that can be 4
gigabytes of text, up to (in this case) 4 million records and (in this case)
149 fields.  These files are HUGE by desktop standards.

My method uses a single line read / process / write and thus is pretty much
guaranteed to handle any size file, any number of records, any number of
fields. 



John W. Colby
Colby Consulting
www.ColbyConsulting.com

-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of
MartyConnelly
Sent: Wednesday, May 09, 2007 2:04 PM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] Bulk insert

Another option is to use SqlBulkCopy a class that comes with Net Framework
2.0 There is a SqlBulkCopy example in the book below that uses a CSV import.
Ttricky to setup but it works.  This new 2.0 class is designed to call the
SQLSMO layer underneath the covers--that replaces SQL DMO.
William Vaughn's Hitchhiker's Guide to Visual Studio and SQL Server (7th
Edition)

Here is C## code example of calling using a datareader stream into
SQLBulkCopy This handles 40,000 rows a second, 8000 if you apply indexing

private void button1_Click(object sender, EventArgs e) {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        DataTable table = new DataTable();
        table.Columns.Add(new DataColumn("File",typeof(string)));
        table.Columns.Add(new DataColumn("IID",typeof(int)));
        table.Columns[1].AutoIncrement = true;
        table.Columns[1].AutoIncrementSeed = 1;
        table.Columns[1].AutoIncrementStep = 1;

        StreamReader sr = new StreamReader("c:\\filelist.txt");
        while (!sr.EndOfStream ){
                table.Rows.Add(sr.ReadLine());                          
        }

        sw.Stop();
        Debug.Write(sw.Elapsed.TotalSeconds + " seconds for " +
table.Rows.Count + " = " + Convert.ToDouble(table.Rows.Count) /
sw.Elapsed.TotalSeconds + " 
rows
per second loaded to datatable");
        sw.Start();

        SqlConnection sqlcon = new SqlConnection("data
source=lon0371xns;initial catalog=SonarBackup;integrated security=sspi");
        SqlBulkCopy bc = new SqlBulkCopy(sqlcon);
        bc.DestinationTableName = "FileList";
        bc.NotifyAfter = 5000;
        bc.SqlRowsCopied += new
SqlRowsCopiedEventHandler(bc_SqlRowsCopied);
        bc.ColumnMappings.Add(new SqlBulkCopyColumnMapping("File", "File"));
        bc.ColumnMappings.Add(new SqlBulkCopyColumnMapping("IID", "IID"));
        sqlcon.Open();
        bc.BulkCopyTimeout = 500;
        bc.WriteToServer(table);
        sw.Stop();

        Debug.Write(sw.Elapsed.TotalSeconds + " seconds for " +
table.Rows.Count + " = " + Convert.ToDouble(table.Rows.Count) /
sw.Elapsed.TotalSeconds + " 
rows
per second loaded to db");



}



JWColby wrote:

>>The only caveat here is if you have empty fields in your file, a single
>>    
>>
>space is inserted instead of a null.
>
>What is it with all "the only caveat here" stuff?  I am sure that there is
a
>darned good reason.
>
>In the end it is just easier to roll your own rather than work around the
>issues that the built in stuff seems to have.  I have 150 fields (in this
>data set).  Somehow I have to do an update on all 150 fields.  I suppose I
>could have my converter run 150 update queries to do each column.  Or 700
>update queries to do the next data set.  Or just do the stripping of the
>spaces external to SQL Server and be done with it.  Either way I still have
>to use my toy.
>
>Once I move up to VB.Net I will be able to use threads to do the stripping
>and the BULK INSERT Sproc in parallel.
>
>BTW, I have to do something very similar all over again once I get the data
>in.  I will need to export the entire table back out, 2 million record sets
>of data to delimited files for CAS / NCOA processing, dumping 100 million
>records out into ~50 files (just the address data this time).  The CAS /
>NCOA process theoretically will process all files placed into an input
>directory (input to that program), dumping the processed files into an
>output directory (output from that program).  At which point I have to pull
>all of the CASS / NCOAd files BACK out of that output directory into to yet
>another table.  And that is just the "pre-processing".
>
>You might be getting a clue by now why I do not want to be manually doing
>all the crapola involved with the solutions that do not involve an external
>control process.  Someday fairly soon I will have a completely automated
>system for doing all this.  I will be back to blowing bubbles and poking at
>Charlotte with a big stick.
>
>John W. Colby
>Colby Consulting
>www.ColbyConsulting.com
>
>-----Original Message-----
>From: dba-sqlserver-bounces at databaseadvisors.com
>[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of
>MartyConnelly
>Sent: Tuesday, May 08, 2007 10:27 PM
>To: dba-sqlserver at databaseadvisors.com
>Subject: Re: [dba-SQLServer] Bulk insert
>
>Uhh, there is a one line fix to remove trailing blanks in SQL, different
>defaults for SQL Server versions and NChar and VChar.
>
>SET ANSI_PADDING OFF
>
> When a table is created with the setting turned on (the default), spaces
>are not trimmed when data is inserted into that table. When ANSI_PADDING is
>off, the spaces are trimmed.
>
>So if you SET ANSI_PADDING OFF, create your table, then set it back on
>again, when you bcp the data into the table, the excess trailing spaces
will
>be eliminated. The only caveat here is if you have empty fields in your
>file, a single space is inserted instead of a null. If this is the case
with
>your data file, you will need to do an update to set columns to null when
>len(yourcolumn) = 0.
>
>See BOL
>http://msdn2.microsoft.com/en-us/library/ms188340.aspx
>
>http://msdn2.microsoft.com/en-us/library/ms187403.aspx
>
>_______________________________________________
>dba-SQLServer mailing list
>dba-SQLServer at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
>http://www.databaseadvisors.com
>
>
>
>  
>

-- 
Marty Connelly
Victoria, B.C.
Canada

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com




More information about the dba-SQLServer mailing list