[dba-VB] How I'm approaching the problem

jwcolby jwcolby at colbyconsulting.com
Mon Jul 23 12:37:33 CDT 2007


Shamil,

I am likewise using SQLBulkCopy to import data from CSV files using a third
party library to handle the conversion of the file to a stream and vv.  I am
quite impressed.  Using my hand written code for doing something similar I
was getting between around 1k records / second loading "row by row" whereas
using SQLBulkCopy and this third party lib I am getting about 16K records /
second = about 60 seconds for 1 million records.  That is quite usable.

I am doing tables of 50 to 100 million records so the total process can
still run 60 to 90 minutes but it was taking 2 days.

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Shamil
Salakhetdinov
Sent: Monday, July 23, 2007 1:25 PM
To: dba-vb at databaseadvisors.com
Subject: Re: [dba-VB] How I'm approaching the problem

All,

To keep "XML - go/no go discussion on" I just wanted to post some real stats
on MS SQL 2000/2005 and XML I have got today out of the project I'm working
currently on - this is a real database (84 tables), real production data,
quantity of source rows in DB = [RowsQty]/3. The test gets data from MS SQL
db tables via stored procedures into in memory List<...> collections via
SqlReader (.NET) etc. - usual modern multi-tier architecture.

Test PC is Intel Dual Core 3Mhz + 2GM RAM with MS Windows 2003 Server -
nothing special for these days.

In more details the test does the following:

MS SQL 2000 Database
 -> SqlDataReader (.NET)
  -> DL custom classes/list (C#)
    -> BL custom classes/list (C#)
     -> XmlSerializer (.NET)
       -> XML file (written via TextWriter (.NET)
        -> deserialize using XmlSerailizer and TextReader (..NET)
         -> serialize into another XML file using XmlSerializer and
TextWriter (.NET) 

Below are some results sorted descending by elapsed time in seconds  (Note:
table names are "censored" - it's a real project and I'm not sure customer
will like to see real table names published)

Performance of this test looks here very impressive - have a look of stats
(Note: results are cross tested by using serialized data to store back into
db then extract again etc. - no mistakes here. And BTW loading these data
using stored procedures row by row takes quite some time (several hours) but
when loading these data using special .NET Framework class
(System.Data.SqlClient.SqlBulkCopy), which is working much like BCP then the
loading of this db takes less than 10 minutes - total qty of rows in all
tables is 1,131,004 (one million...)

Elapsed time for this test: 122,421875 seconds (2 minutes), total elements
(rows) processed - 3,393,012 (three million...)

I will be experimenting more in the coming days (end of August probably) and
I will post more results... 

And this is not pure experimenting/"mad scientist games" - this code will be
used in production for stress testing, for data replication etc. etc.

Finally stats: DB -> Memory -> XML file -> Memory -> another Xml File (to
use in first cross-test):

TableName   Elapsed Time (s)	RowsQty
======       ==========        =====
Table53	  20.7500000	260286
Table71	  12.6562500	327819
Table29	  08.2500000	105000
Table79	  07.4843750	204243
Table80	  07.3750000	299556
Table84	   06.7656250	262908
Table82	   06.6718750	263337
Table75	   06.5000000	250155
Table37	   05.4062500	318555
Table68	   04.0156250	189810
Table69	   03.7500000	83490
Table72	   03.1562500	74598
Table81	   03.0312500	199287
Table73	   02.8281250	135150
Table38	   02.6406250	108636
Table30	   01.8593750	108726


Here are individual steps stats:

Database -> custom objects in List<...>

Elapsed Time(s)	RowsQty
==========    ======
05.9218750	86762
03.7656250	109273
02.6406250	35000
02.2968750	68081
02.2812500	99852
02.2343750	87636
02.1875000	87779
02.0312500	83385
01.7656250	106185
01.4062500	24866
01.3906250	63270
01.2343750	27830
01.0468750	45050
01.0312500	66429


Memory (custom objects) -> XML files

ElapsedTime(s)	RowsQty
=========       =======
08.8750000	86762
05.0468750	109273
03.2812500	35000
02.9375000	99852
02.7187500	83385
02.6718750	68081
02.5156250	87779
02.4687500	87636
02.1250000	106185
01.5000000	27830
01.4218750	63270
01.1562500	66429
01.00000 	24866
00.9687500	45050
00.8750000	36212
00.6718750	36242


XML File -> Memory (custom objects)

ElapsedTime(s)	RowsQty
=======             =====
05.9531250	86762
03.8437500	109273
02.5156250	68081
02.3281250	35000
02.1562500	99852
02.0625000	87636
01.9687500	87779
01.7500000	83385
01.5156250	106185
01.2031250	63270
01.0156250	27830
00.8437500	66429
00.8125000	45050
00.7812500	36212


And here is "microscopic" generic DAL, which is built using only three(!)
ADO.NET classes SqlConnection, SqlCommand, IDataReader  used here:


public int ExecuteNonQuery(DbCommand cmd) {
    return ExecuteNonQuery(cmd, false);
}

public IDataReader ExecuteReader(DbCommand cmd) {
    return ExecuteReader(cmd, CommandBehavior.Default); }

public IDataReader ExecuteReader(DbCommand cmd, CommandBehavior behavior) {
    return cmd.ExecuteReader(behavior);
}

That's it.

--
Shamil
 
-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Jim Lawrence
Sent: Monday, July 23, 2007 7:01 AM
To: dba-vb at databaseadvisors.com
Subject: Re: [dba-VB] How I'm approaching the problem

Hi John:

You have taken on a massively complex project single-handedly. When I was
working full-time for a company and a similar sized project appeared I
assign at least 2 people to the project. It seems that 2 people can do the
work of three when they work together.

MS SQL people tend to think their a little better than the standard Access
grunts. Why that is so I have no idea. Considering that MS SQL developers
have the luxury of working with a faster and better product that is much
easier to obtain positive results than from an equally complex project
written totally in Access. That is why I write most of my new apps in a
combination of Access FE and MS SQL BE because I get the best of all worlds.

MS SQL is more rugged than the MDB, handles unbound connections without the
absolute need for a complex locking scheme as MS SQL is designed to work it
this type of environment. It internally handles locking, multi-access to a
single record or group of records. It is a professional level DB and is
actually easier to work with. Unfortunately, ADO is the best connection
protocol for performance and reliability but if you do not know it, it is
just another major item to learn.

If we throw learn .Net from scratch into the mix and you have to hold on
with both hands just to keep your sanity. I am amazed at how far you have
come in such a short time. Nothing like a baptism in fire... If you are a
little stressed, it is to be expected.

Hope your day has gone well.

Regards
Jim
    
<<< tail of thread trimmed>>

_______________________________________________
dba-VB mailing list
dba-VB at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-vb
http://www.databaseadvisors.com




More information about the dba-VB mailing list