[dba-SQLServer] Bulk insert

jwcolby jwcolby at colbyconsulting.com
Fri May 11 19:00:48 CDT 2007


>Go with a big 64 bit CPU box and Windows Vista 64 bit Business Version that
will give you access to 128 Gig of onboard RAM

Yep.  And about $20K later I will have a honkin machine.  And someday I will
do that, but not until the money starts to roll in from doing all this stuff
on the cheap.


John W. Colby
Colby Consulting
www.ColbyConsulting.com
-----Original Message-----
From: dba-sqlserver-bounces at databaseadvisors.com
[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of
MartyConnelly
Sent: Friday, May 11, 2007 7:24 PM
To: dba-sqlserver at databaseadvisors.com
Subject: Re: [dba-SQLServer] Bulk insert

Go with a big 64 bit CPU box and Windows Vista 64 bit Business Version that
will give you access to 128 Gig of onboard RAM


JWColby wrote:

>Marty,
>
>My only question about this method is the loading of all the records at 
>one go before processing.  Remember that I am doing raw files that can 
>be 4 gigabytes of text, up to (in this case) 4 million records and (in 
>this case)
>149 fields.  These files are HUGE by desktop standards.
>
>My method uses a single line read / process / write and thus is pretty 
>much guaranteed to handle any size file, any number of records, any 
>number of fields.
>
>
>
>John W. Colby
>Colby Consulting
>www.ColbyConsulting.com
>
>-----Original Message-----
>From: dba-sqlserver-bounces at databaseadvisors.com
>[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of 
>MartyConnelly
>Sent: Wednesday, May 09, 2007 2:04 PM
>To: dba-sqlserver at databaseadvisors.com
>Subject: Re: [dba-SQLServer] Bulk insert
>
>Another option is to use SqlBulkCopy a class that comes with Net 
>Framework 2.0 There is a SqlBulkCopy example in the book below that uses a
CSV import.
>Ttricky to setup but it works.  This new 2.0 class is designed to call 
>the SQLSMO layer underneath the covers--that replaces SQL DMO.
>William Vaughn's Hitchhiker's Guide to Visual Studio and SQL Server 
>(7th
>Edition)
>
>Here is C## code example of calling using a datareader stream into 
>SQLBulkCopy This handles 40,000 rows a second, 8000 if you apply 
>indexing
>
>private void button1_Click(object sender, EventArgs e) {
>        Stopwatch sw = new Stopwatch();
>        sw.Start();
>        DataTable table = new DataTable();
>        table.Columns.Add(new DataColumn("File",typeof(string)));
>        table.Columns.Add(new DataColumn("IID",typeof(int)));
>        table.Columns[1].AutoIncrement = true;
>        table.Columns[1].AutoIncrementSeed = 1;
>        table.Columns[1].AutoIncrementStep = 1;
>
>        StreamReader sr = new StreamReader("c:\\filelist.txt");
>        while (!sr.EndOfStream ){
>                table.Rows.Add(sr.ReadLine());                          
>        }
>
>        sw.Stop();
>        Debug.Write(sw.Elapsed.TotalSeconds + " seconds for " + 
>table.Rows.Count + " = " + Convert.ToDouble(table.Rows.Count) / 
>sw.Elapsed.TotalSeconds + "
>rows
>per second loaded to datatable");
>        sw.Start();
>
>        SqlConnection sqlcon = new SqlConnection("data 
>source=lon0371xns;initial catalog=SonarBackup;integrated security=sspi");
>        SqlBulkCopy bc = new SqlBulkCopy(sqlcon);
>        bc.DestinationTableName = "FileList";
>        bc.NotifyAfter = 5000;
>        bc.SqlRowsCopied += new
>SqlRowsCopiedEventHandler(bc_SqlRowsCopied);
>        bc.ColumnMappings.Add(new SqlBulkCopyColumnMapping("File",
"File"));
>        bc.ColumnMappings.Add(new SqlBulkCopyColumnMapping("IID", "IID"));
>        sqlcon.Open();
>        bc.BulkCopyTimeout = 500;
>        bc.WriteToServer(table);
>        sw.Stop();
>
>        Debug.Write(sw.Elapsed.TotalSeconds + " seconds for " + 
>table.Rows.Count + " = " + Convert.ToDouble(table.Rows.Count) / 
>sw.Elapsed.TotalSeconds + "
>rows
>per second loaded to db");
>
>
>
>}
>
>
>
>JWColby wrote:
>
>  
>
>>>The only caveat here is if you have empty fields in your file, a 
>>>single
>>>   
>>>
>>>      
>>>
>>space is inserted instead of a null.
>>
>>What is it with all "the only caveat here" stuff?  I am sure that 
>>there is
>>    
>>
>a
>  
>
>>darned good reason.
>>
>>In the end it is just easier to roll your own rather than work around 
>>the issues that the built in stuff seems to have.  I have 150 fields 
>>(in this data set).  Somehow I have to do an update on all 150 fields.  
>>I suppose I could have my converter run 150 update queries to do each 
>>column.  Or 700 update queries to do the next data set.  Or just do 
>>the stripping of the spaces external to SQL Server and be done with 
>>it.  Either way I still have to use my toy.
>>
>>Once I move up to VB.Net I will be able to use threads to do the 
>>stripping and the BULK INSERT Sproc in parallel.
>>
>>BTW, I have to do something very similar all over again once I get the 
>>data in.  I will need to export the entire table back out, 2 million 
>>record sets of data to delimited files for CAS / NCOA processing, 
>>dumping 100 million records out into ~50 files (just the address data 
>>this time).  The CAS / NCOA process theoretically will process all 
>>files placed into an input directory (input to that program), dumping 
>>the processed files into an output directory (output from that 
>>program).  At which point I have to pull all of the CASS / NCOAd files 
>>BACK out of that output directory into to yet another table.  And that is
just the "pre-processing".
>>
>>You might be getting a clue by now why I do not want to be manually 
>>doing all the crapola involved with the solutions that do not involve 
>>an external control process.  Someday fairly soon I will have a 
>>completely automated system for doing all this.  I will be back to 
>>blowing bubbles and poking at Charlotte with a big stick.
>>
>>John W. Colby
>>Colby Consulting
>>www.ColbyConsulting.com
>>
>>-----Original Message-----
>>From: dba-sqlserver-bounces at databaseadvisors.com
>>[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of 
>>MartyConnelly
>>Sent: Tuesday, May 08, 2007 10:27 PM
>>To: dba-sqlserver at databaseadvisors.com
>>Subject: Re: [dba-SQLServer] Bulk insert
>>
>>Uhh, there is a one line fix to remove trailing blanks in SQL, 
>>different defaults for SQL Server versions and NChar and VChar.
>>
>>SET ANSI_PADDING OFF
>>
>>When a table is created with the setting turned on (the default), 
>>spaces are not trimmed when data is inserted into that table. When 
>>ANSI_PADDING is off, the spaces are trimmed.
>>
>>So if you SET ANSI_PADDING OFF, create your table, then set it back on 
>>again, when you bcp the data into the table, the excess trailing 
>>spaces
>>    
>>
>will
>  
>
>>be eliminated. The only caveat here is if you have empty fields in 
>>your file, a single space is inserted instead of a null. If this is 
>>the case
>>    
>>
>with
>  
>
>>your data file, you will need to do an update to set columns to null 
>>when
>>len(yourcolumn) = 0.
>>
>>See BOL
>>http://msdn2.microsoft.com/en-us/library/ms188340.aspx
>>
>>http://msdn2.microsoft.com/en-us/library/ms187403.aspx
>>
>>    
>>

--
Marty Connelly
Victoria, B.C.
Canada

_______________________________________________
dba-SQLServer mailing list
dba-SQLServer at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-sqlserver
http://www.databaseadvisors.com




More information about the dba-SQLServer mailing list