[dba-VB] How I'm approaching the problem

jwcolby jwcolby at colbyconsulting.com
Tue Jul 24 15:29:16 CDT 2007


Shamil,

>John, I'm sorry I have to suspend my participation in this thread for
several days - very urgent project is here waiting for release... 

Understood, I will answer so that it will be available for when you are
back.

>I meant does your customer need to make bulk loading any faster or not?

The short answer is yes.  

The long answer is really that I did not buy the library to make the speed
faster, I bought it to make the implementation (coding) faster by not having
to handle this part myself.  

However to answer your question directly, I am moving to a model where I
charge the client for each operation, i.e. for computer time.  IOW, in each
process that imports or exports data, the amount of time required to perform
that process is logged, and will (eventually) be directly entered into a
billing program.  So the client will be billed for X hours of computer time,
where A hours are used for import of raw data lists, B hours are used for
address export, C hours are used for address validation (by a different
server in my office), D hours are used to import the data back into the SQL
Server database, etc.  I will not be breaking the time down on the bill, I
envision billing line items like 

$XXX for address import of raw data list XYZ
$YYY for address validation of list QRS

Etc.

With luck (and hard work), someday soon these processes will run totally
automatically, with no manual input from me.  Thus I will be essentially
charging for "process", not my personal time.

This billing method allows the client to understand the overall costs.  But
by logging the actual times required for doing each part I can understand
where my bottlenecks are, plan for upgrades to handle my requirements etc.  

So the faster it works the less I make and the more the client saves.  8-( 

Alternatively, the faster it runs the more lists I can run on a given
machine in a given period of time.  8-)

I am really trying to sell the customer on how efficient it is (vs. his old
method) to import his lists, and then export / address validate / reimport
them.  The client does not understand the mechanics behind the process, nor
does he want to or care to.  What he does understand is that it used to cost
him (literally) $1.25 / thousand names to get addresses validated.  I can do
it for $.12 / thousand names, or about 1/10th his previous cost.  Not only
can I do it for that, I can make a killing doing it at that cost, IF I can
move the frequency up to once per month or even every three months.  

The lists have to be reprocessed every X time units, perhaps monthly,
perhaps quarterly.  Again, by being able to demonstrate efficiency, I can
convince the client that is cost effective to use my service, and also cost
effective to perform this service monthly instead of waiting every six
months or every year.  He gets to sell his business by marketing "addresses
cleaned monthly", and believe me that is important to his client.  I doubt
he will drop his cost to pass on the cost savings so he ends up making more
/ thousand names since he is now paying way less to do that process.  

His previous provider was so expensive that he literally did the address
validation on each order and passed the cost through.  I want a business
model where he can afford (and he CAN!) to do this process monthly on each
of his lists.  Thus I get an income stream that I can depend on, and I get a
process that does not cost me MY TIME, but rather sells the client my
computer's time.  "More work, more / faster computers" instead of "more
work, more of MY TIME".  

I want to be sitting on the beach while my computers are earning me money.

So yea, it always pays to do something faster, whether in my ability to make
more money on each hour of machine time, or my ability to be more efficient
and sell that to the client and get more work.

John W. Colby
Colby Consulting
www.ColbyConsulting.com 
-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Shamil
Salakhetdinov
Sent: Tuesday, July 24, 2007 3:37 PM
To: dba-vb at databaseadvisors.com
Subject: Re: [dba-VB] How I'm approaching the problem

Hello John,

<<<
I don't understand the context - would what have any economical sense?
>>>
I meant does your customer need to make bulk loading any faster or not? You
know one may fly Paris - NY using ordinary Boeing planes, one can use (in
the past) Concord or even TU-144(!?)
(http://en.wikipedia.org/wiki/Tupolev_Tu-144) - IOW does your customer still
need Concord for their task or current very good Boeing solution is good
enough for them now and for a long time in the future?

I also mean we can try to find what is (if any) bottleneck of the current
solution and "break" this bottleneck?

Some tests (Dual Core 3MHz, 2GB, W2003 Server, simple IDE/SATA 7400rpm HDD)
- the raw performance of reading a delimited by TAB char CSV file in .NET:

Total Counter = 1011852 (1 million...)
ElapsedTime = 3,359375 seconds
ElapsedTime = 0,0559895833333333 minutes

Total Counter = 10006092 (10 million...) ElapsedTime = 32,890625 seconds
ElapsedTime = 0,548177083333333 minutes

Total Counter = 100010952 (100 million...) ElapsedTime = 327,875 seconds
ElapsedTime = 5,46458333333333 minutes


Splitting of CSV file lines into array added...

Total Counter = 1011852 (1 million...)
ElapsedTime = 8,375 seconds
ElapsedTime = 0,139583333333333 minutes

... (I didn't have time to test other cases of 10 and 100 million source
file lines - please do if you have time...)


Of course that is just reading the file - the open question is how closely
this extreme files reading performance can be approached when such a file is
bulk loaded into MS SQL Database?

Join, I'm sorry I have to suspend my participation in this thread for
several days - very urgent project is here waiting for release...


I must say I'm very interested to return here next week, sorry for any
inconvenience...

--
Shamil

P.S. Simple C# code used for testing (watch line wraps):


using System;
using System.Collections.Generic;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            long MAX_LIMIT = 1000000; // 00;
            string s = @"F:\Temp\XL\ConsoleApplication1\IN\testfile.txt";
            long totalCounter = 0;
            long lineCounter = 0;
            DateTime startTime = DateTime.Now;
            while (totalCounter < MAX_LIMIT)
            {
                System.IO.TextReader rdr = new System.IO.StreamReader(s);
                lineCounter = 0;
                string inline = "";
                char delimiter = (char)9;
                while ((inline = rdr.ReadLine())!=null) {
                    string[] fields = inline.Split(delimiter);  
                    ++totalCounter;
                    ++lineCounter;
                }
                rdr.Close();
            }

            DateTime endTime = DateTime.Now ;
            TimeSpan elapsedTime = endTime - startTime;
            Console.WriteLine("File Line Counter = {0}",
lineCounter.ToString());
            Console.WriteLine("Total Counter = {0}",
totalCounter.ToString());
            Console.WriteLine("ElapsedTime = {0} seconds",
elapsedTime.TotalSeconds.ToString());
            Console.WriteLine("ElapsedTime = {0} minutes",
elapsedTime.TotalMinutes.ToString());
        }
    }
}
 




More information about the dba-VB mailing list