[AccessD] Infutor Statistics - was RE: [dba-SQLServer] Bulk insert

Gustav Brock Gustav at cactus.dk
Wed May 9 12:18:28 CDT 2007


Hi John

Thanks for sharing. Quite a story.
It stresses that BULK INSERT is mandatory for this level of data sizes.

One thing that strikes me, however, is the slow performance of your space stripping; 1000 lines/s is not very much. But I guess you do more than just removing spaces.

/gustav

>>> jwcolby at colbyconsulting.com 09-05-2007 19:01 >>>
Just an FYI.  The table that I have been building this whole time contains
97.5 million records, exactly 149 (imported) fields and requires 62.6
Gigabytes of data space inside of SQL Server. It took 2 hours and 28 minutes
just to build the auto increment PK field after the table was finished
importing records.  The index space for the table (with just this single
index) is 101 Megabytes.  

There were 56 raw data files which required 75 gigabytes of disk space to
hold.  There were 56 CSV files created after stripping out the spaces, which
required 40.8 Gigabytes of disk space to hold.  Thus by my calculations, 35
gigs of disk space was required to hold JUST THE SPACES in the original
fixed width file, with the real data occupying 40.8 GB.  It is interesting
to note that the raw data in the CSV file was 41gb while the data space
required in SQL Server is 62 gb.

As the process was built over time, I do not have accurate specs for each
and every file, but the process of stripping the spaces off of the fields
happened at about 1K records / second.  Given 97.5 million records, this
equates to 97.5 thousand seconds to do the space stripping, which is 27.77
hours.  That of course is done in a VBA application.  Again I don't have
accurate specs for all of the bulk inserts, however those that I recorded
the times for summed to 71.2 million records, which took 4674 seconds (1.3
hours) to import using a BULK INSERT statement, which equates to
approximately 15K records / second.  Remember that this BULK INSERT is
importing precleaned data with pipe delimiters.  Also remember that the BULK
INSERT itself took 1.3 hours but due to the lack of automation in feeding
the Sproc file names, I had to manually edit the SPROC each time I wanted to
import a new file so the actual import took much longer, since I wasn't
necessarily watching the computer as the last SPROC run finished.

So there you go, that is what I have been trying to accomplish this last few
weeks.

John W. Colby
Colby Consulting
www.ColbyConsulting.com 





More information about the AccessD mailing list