jwcolby
jwcolby at colbyconsulting.com
Mon Jul 23 12:37:33 CDT 2007
Shamil, I am likewise using SQLBulkCopy to import data from CSV files using a third party library to handle the conversion of the file to a stream and vv. I am quite impressed. Using my hand written code for doing something similar I was getting between around 1k records / second loading "row by row" whereas using SQLBulkCopy and this third party lib I am getting about 16K records / second = about 60 seconds for 1 million records. That is quite usable. I am doing tables of 50 to 100 million records so the total process can still run 60 to 90 minutes but it was taking 2 days. John W. Colby Colby Consulting www.ColbyConsulting.com -----Original Message----- From: dba-vb-bounces at databaseadvisors.com [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Shamil Salakhetdinov Sent: Monday, July 23, 2007 1:25 PM To: dba-vb at databaseadvisors.com Subject: Re: [dba-VB] How I'm approaching the problem All, To keep "XML - go/no go discussion on" I just wanted to post some real stats on MS SQL 2000/2005 and XML I have got today out of the project I'm working currently on - this is a real database (84 tables), real production data, quantity of source rows in DB = [RowsQty]/3. The test gets data from MS SQL db tables via stored procedures into in memory List<...> collections via SqlReader (.NET) etc. - usual modern multi-tier architecture. Test PC is Intel Dual Core 3Mhz + 2GM RAM with MS Windows 2003 Server - nothing special for these days. In more details the test does the following: MS SQL 2000 Database -> SqlDataReader (.NET) -> DL custom classes/list (C#) -> BL custom classes/list (C#) -> XmlSerializer (.NET) -> XML file (written via TextWriter (.NET) -> deserialize using XmlSerailizer and TextReader (..NET) -> serialize into another XML file using XmlSerializer and TextWriter (.NET) Below are some results sorted descending by elapsed time in seconds (Note: table names are "censored" - it's a real project and I'm not sure customer will like to see real table names published) Performance of this test looks here very impressive - have a look of stats (Note: results are cross tested by using serialized data to store back into db then extract again etc. - no mistakes here. And BTW loading these data using stored procedures row by row takes quite some time (several hours) but when loading these data using special .NET Framework class (System.Data.SqlClient.SqlBulkCopy), which is working much like BCP then the loading of this db takes less than 10 minutes - total qty of rows in all tables is 1,131,004 (one million...) Elapsed time for this test: 122,421875 seconds (2 minutes), total elements (rows) processed - 3,393,012 (three million...) I will be experimenting more in the coming days (end of August probably) and I will post more results... And this is not pure experimenting/"mad scientist games" - this code will be used in production for stress testing, for data replication etc. etc. Finally stats: DB -> Memory -> XML file -> Memory -> another Xml File (to use in first cross-test): TableName Elapsed Time (s) RowsQty ====== ========== ===== Table53 20.7500000 260286 Table71 12.6562500 327819 Table29 08.2500000 105000 Table79 07.4843750 204243 Table80 07.3750000 299556 Table84 06.7656250 262908 Table82 06.6718750 263337 Table75 06.5000000 250155 Table37 05.4062500 318555 Table68 04.0156250 189810 Table69 03.7500000 83490 Table72 03.1562500 74598 Table81 03.0312500 199287 Table73 02.8281250 135150 Table38 02.6406250 108636 Table30 01.8593750 108726 Here are individual steps stats: Database -> custom objects in List<...> Elapsed Time(s) RowsQty ========== ====== 05.9218750 86762 03.7656250 109273 02.6406250 35000 02.2968750 68081 02.2812500 99852 02.2343750 87636 02.1875000 87779 02.0312500 83385 01.7656250 106185 01.4062500 24866 01.3906250 63270 01.2343750 27830 01.0468750 45050 01.0312500 66429 Memory (custom objects) -> XML files ElapsedTime(s) RowsQty ========= ======= 08.8750000 86762 05.0468750 109273 03.2812500 35000 02.9375000 99852 02.7187500 83385 02.6718750 68081 02.5156250 87779 02.4687500 87636 02.1250000 106185 01.5000000 27830 01.4218750 63270 01.1562500 66429 01.00000 24866 00.9687500 45050 00.8750000 36212 00.6718750 36242 XML File -> Memory (custom objects) ElapsedTime(s) RowsQty ======= ===== 05.9531250 86762 03.8437500 109273 02.5156250 68081 02.3281250 35000 02.1562500 99852 02.0625000 87636 01.9687500 87779 01.7500000 83385 01.5156250 106185 01.2031250 63270 01.0156250 27830 00.8437500 66429 00.8125000 45050 00.7812500 36212 And here is "microscopic" generic DAL, which is built using only three(!) ADO.NET classes SqlConnection, SqlCommand, IDataReader used here: public int ExecuteNonQuery(DbCommand cmd) { return ExecuteNonQuery(cmd, false); } public IDataReader ExecuteReader(DbCommand cmd) { return ExecuteReader(cmd, CommandBehavior.Default); } public IDataReader ExecuteReader(DbCommand cmd, CommandBehavior behavior) { return cmd.ExecuteReader(behavior); } That's it. -- Shamil -----Original Message----- From: dba-vb-bounces at databaseadvisors.com [mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Jim Lawrence Sent: Monday, July 23, 2007 7:01 AM To: dba-vb at databaseadvisors.com Subject: Re: [dba-VB] How I'm approaching the problem Hi John: You have taken on a massively complex project single-handedly. When I was working full-time for a company and a similar sized project appeared I assign at least 2 people to the project. It seems that 2 people can do the work of three when they work together. MS SQL people tend to think their a little better than the standard Access grunts. Why that is so I have no idea. Considering that MS SQL developers have the luxury of working with a faster and better product that is much easier to obtain positive results than from an equally complex project written totally in Access. That is why I write most of my new apps in a combination of Access FE and MS SQL BE because I get the best of all worlds. MS SQL is more rugged than the MDB, handles unbound connections without the absolute need for a complex locking scheme as MS SQL is designed to work it this type of environment. It internally handles locking, multi-access to a single record or group of records. It is a professional level DB and is actually easier to work with. Unfortunately, ADO is the best connection protocol for performance and reliability but if you do not know it, it is just another major item to learn. If we throw learn .Net from scratch into the mix and you have to hold on with both hands just to keep your sanity. I am amazed at how far you have come in such a short time. Nothing like a baptism in fire... If you are a little stressed, it is to be expected. Hope your day has gone well. Regards Jim <<< tail of thread trimmed>> _______________________________________________ dba-VB mailing list dba-VB at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-vb http://www.databaseadvisors.com