jwcolby
jwcolby at colbyconsulting.com
Fri May 11 19:00:48 CDT 2007
>Go with a big 64 bit CPU box and Windows Vista 64 bit Business Version that will give you access to 128 Gig of onboard RAM Yep. And about $20K later I will have a honkin machine. And someday I will do that, but not until the money starts to roll in from doing all this stuff on the cheap. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-sqlserver-bounces at databaseadvisors.com [mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of MartyConnelly Sent: Friday, May 11, 2007 7:24 PM To: dba-sqlserver at databaseadvisors.com Subject: Re: [dba-SQLServer] Bulk insert Go with a big 64 bit CPU box and Windows Vista 64 bit Business Version that will give you access to 128 Gig of onboard RAM JWColby wrote: >Marty, > >My only question about this method is the loading of all the records at >one go before processing. Remember that I am doing raw files that can >be 4 gigabytes of text, up to (in this case) 4 million records and (in >this case) >149 fields. These files are HUGE by desktop standards. > >My method uses a single line read / process / write and thus is pretty >much guaranteed to handle any size file, any number of records, any >number of fields. > > > >John W. Colby >Colby Consulting >www.ColbyConsulting.com > >-----Original Message----- >From: dba-sqlserver-bounces at databaseadvisors.com >[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of >MartyConnelly >Sent: Wednesday, May 09, 2007 2:04 PM >To: dba-sqlserver at databaseadvisors.com >Subject: Re: [dba-SQLServer] Bulk insert > >Another option is to use SqlBulkCopy a class that comes with Net >Framework 2.0 There is a SqlBulkCopy example in the book below that uses a CSV import. >Ttricky to setup but it works. This new 2.0 class is designed to call >the SQLSMO layer underneath the covers--that replaces SQL DMO. >William Vaughn's Hitchhiker's Guide to Visual Studio and SQL Server >(7th >Edition) > >Here is C## code example of calling using a datareader stream into >SQLBulkCopy This handles 40,000 rows a second, 8000 if you apply >indexing > >private void button1_Click(object sender, EventArgs e) { > Stopwatch sw = new Stopwatch(); > sw.Start(); > DataTable table = new DataTable(); > table.Columns.Add(new DataColumn("File",typeof(string))); > table.Columns.Add(new DataColumn("IID",typeof(int))); > table.Columns[1].AutoIncrement = true; > table.Columns[1].AutoIncrementSeed = 1; > table.Columns[1].AutoIncrementStep = 1; > > StreamReader sr = new StreamReader("c:\\filelist.txt"); > while (!sr.EndOfStream ){ > table.Rows.Add(sr.ReadLine()); > } > > sw.Stop(); > Debug.Write(sw.Elapsed.TotalSeconds + " seconds for " + >table.Rows.Count + " = " + Convert.ToDouble(table.Rows.Count) / >sw.Elapsed.TotalSeconds + " >rows >per second loaded to datatable"); > sw.Start(); > > SqlConnection sqlcon = new SqlConnection("data >source=lon0371xns;initial catalog=SonarBackup;integrated security=sspi"); > SqlBulkCopy bc = new SqlBulkCopy(sqlcon); > bc.DestinationTableName = "FileList"; > bc.NotifyAfter = 5000; > bc.SqlRowsCopied += new >SqlRowsCopiedEventHandler(bc_SqlRowsCopied); > bc.ColumnMappings.Add(new SqlBulkCopyColumnMapping("File", "File")); > bc.ColumnMappings.Add(new SqlBulkCopyColumnMapping("IID", "IID")); > sqlcon.Open(); > bc.BulkCopyTimeout = 500; > bc.WriteToServer(table); > sw.Stop(); > > Debug.Write(sw.Elapsed.TotalSeconds + " seconds for " + >table.Rows.Count + " = " + Convert.ToDouble(table.Rows.Count) / >sw.Elapsed.TotalSeconds + " >rows >per second loaded to db"); > > > >} > > > >JWColby wrote: > > > >>>The only caveat here is if you have empty fields in your file, a >>>single >>> >>> >>> >>> >>space is inserted instead of a null. >> >>What is it with all "the only caveat here" stuff? I am sure that >>there is >> >> >a > > >>darned good reason. >> >>In the end it is just easier to roll your own rather than work around >>the issues that the built in stuff seems to have. I have 150 fields >>(in this data set). Somehow I have to do an update on all 150 fields. >>I suppose I could have my converter run 150 update queries to do each >>column. Or 700 update queries to do the next data set. Or just do >>the stripping of the spaces external to SQL Server and be done with >>it. Either way I still have to use my toy. >> >>Once I move up to VB.Net I will be able to use threads to do the >>stripping and the BULK INSERT Sproc in parallel. >> >>BTW, I have to do something very similar all over again once I get the >>data in. I will need to export the entire table back out, 2 million >>record sets of data to delimited files for CAS / NCOA processing, >>dumping 100 million records out into ~50 files (just the address data >>this time). The CAS / NCOA process theoretically will process all >>files placed into an input directory (input to that program), dumping >>the processed files into an output directory (output from that >>program). At which point I have to pull all of the CASS / NCOAd files >>BACK out of that output directory into to yet another table. And that is just the "pre-processing". >> >>You might be getting a clue by now why I do not want to be manually >>doing all the crapola involved with the solutions that do not involve >>an external control process. Someday fairly soon I will have a >>completely automated system for doing all this. I will be back to >>blowing bubbles and poking at Charlotte with a big stick. >> >>John W. Colby >>Colby Consulting >>www.ColbyConsulting.com >> >>-----Original Message----- >>From: dba-sqlserver-bounces at databaseadvisors.com >>[mailto:dba-sqlserver-bounces at databaseadvisors.com] On Behalf Of >>MartyConnelly >>Sent: Tuesday, May 08, 2007 10:27 PM >>To: dba-sqlserver at databaseadvisors.com >>Subject: Re: [dba-SQLServer] Bulk insert >> >>Uhh, there is a one line fix to remove trailing blanks in SQL, >>different defaults for SQL Server versions and NChar and VChar. >> >>SET ANSI_PADDING OFF >> >>When a table is created with the setting turned on (the default), >>spaces are not trimmed when data is inserted into that table. When >>ANSI_PADDING is off, the spaces are trimmed. >> >>So if you SET ANSI_PADDING OFF, create your table, then set it back on >>again, when you bcp the data into the table, the excess trailing >>spaces >> >> >will > > >>be eliminated. The only caveat here is if you have empty fields in >>your file, a single space is inserted instead of a null. If this is >>the case >> >> >with > > >>your data file, you will need to do an update to set columns to null >>when >>len(yourcolumn) = 0. >> >>See BOL >>http://msdn2.microsoft.com/en-us/library/ms188340.aspx >> >>http://msdn2.microsoft.com/en-us/library/ms187403.aspx >> >> >> -- Marty Connelly Victoria, B.C. Canada _______________________________________________ dba-SQLServer mailing list dba-SQLServer at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-sqlserver http://www.databaseadvisors.com