JWColby
jwcolby at colbyconsulting.com
Tue May 1 09:20:32 CDT 2007
I have been struggling to get large data files to import into SQL Server in a manner that won't take weeks to complete. I was trying to use CSV files to get around the trailing (and occasionally leading) spaces in the data fields caused by fixed width / space padded fields in the source text files. So I created a program in MS Access to read the source file, line by line, carve out each data element, strip the spaces, then when an entire line was assembled, write that back out to a new csv file. The next issue I ran into is that SQL Server doesn't know how to deal with text qualifiers, which are commonly used in CSV files to encapsulate commas inside of data, IOW if a comma is the delimiter, then the data is enclosed in quotes: "51 Some Street, Ste2001", "Some City". In THIS CASE, since I control the intermediary file, I am able to change from a space delimiter to a | delimiter and remove the quotes. Thus the previous example turns into 51 Some Street, Ste2001|Some City| This can be imported into SQL Server without an issue since I can specify the FIELDTERMINATOR parameter to the bulk insert in a query. Using bulk insert pulled my import up from 300-400 records per second to almost one thousand records per second. Not blazing but with 100 million records to import, it is a significant improvement in total time to import. John W. Colby Colby Consulting www.ColbyConsulting.com