Gustav Brock
Gustav at cactus.dk
Wed May 9 12:18:28 CDT 2007
Hi John Thanks for sharing. Quite a story. It stresses that BULK INSERT is mandatory for this level of data sizes. One thing that strikes me, however, is the slow performance of your space stripping; 1000 lines/s is not very much. But I guess you do more than just removing spaces. /gustav >>> jwcolby at colbyconsulting.com 09-05-2007 19:01 >>> Just an FYI. The table that I have been building this whole time contains 97.5 million records, exactly 149 (imported) fields and requires 62.6 Gigabytes of data space inside of SQL Server. It took 2 hours and 28 minutes just to build the auto increment PK field after the table was finished importing records. The index space for the table (with just this single index) is 101 Megabytes. There were 56 raw data files which required 75 gigabytes of disk space to hold. There were 56 CSV files created after stripping out the spaces, which required 40.8 Gigabytes of disk space to hold. Thus by my calculations, 35 gigs of disk space was required to hold JUST THE SPACES in the original fixed width file, with the real data occupying 40.8 GB. It is interesting to note that the raw data in the CSV file was 41gb while the data space required in SQL Server is 62 gb. As the process was built over time, I do not have accurate specs for each and every file, but the process of stripping the spaces off of the fields happened at about 1K records / second. Given 97.5 million records, this equates to 97.5 thousand seconds to do the space stripping, which is 27.77 hours. That of course is done in a VBA application. Again I don't have accurate specs for all of the bulk inserts, however those that I recorded the times for summed to 71.2 million records, which took 4674 seconds (1.3 hours) to import using a BULK INSERT statement, which equates to approximately 15K records / second. Remember that this BULK INSERT is importing precleaned data with pipe delimiters. Also remember that the BULK INSERT itself took 1.3 hours but due to the lack of automation in feeding the Sproc file names, I had to manually edit the SPROC each time I wanted to import a new file so the actual import took much longer, since I wasn't necessarily watching the computer as the last SPROC run finished. So there you go, that is what I have been trying to accomplish this last few weeks. John W. Colby Colby Consulting www.ColbyConsulting.com