[dba-VB] Count a character in a string

Shamil Salakhetdinov shamil at users.mns.ru
Sun Apr 13 05:08:53 CDT 2008


Hi Max,

Our postings crossed - I have just posted RegEx test results...

N.B.: when testing this or that method both memory consumption and execution
speed should be taken into consideration:

- high memory consumption giving speediest results can be neglected to solve
small tasks within "tiny" applications/utilities, but
- high memory consumption is the thing, which could "unexpectedly" give
nasty side effects in the large application systems: 

One example I have got several months ago, which was so "tricky" to watch,
and was not easy to find why it could ever happen in .NET: I have got a
memory leakage/excessive memory consumption (in .NET!) because of "lazy
loading", and that latter "lazy loading" loaded quite some data when it
wasn't needed, in the cycle processing millions of records - the results
were like that when I watched 'Page File Usage History' in Task Manager:

   /|   /|   /|   .
  / |  / |  / |  .
 /  | /  | /  | .
/   |/   |/   |/


And in usual mode without heavy application system workload everything
worked well with the same data...

Recap:

- Split(...) approach could result in similar to the above side effect for
very large input strings;

// 10 sec for 20,000,000 iterations
string[] recordLine = s.Split('|');
count = recordLine.Length-1;  

- Replace(...) could also result in the above side effect;

// 10 sec for 20,000,000 iterations
string temp = s.Replace("|", 
        Microsoft.VisualBasic.Constants.vbNullString);
count = s.Length - temp.Length;

- RegEx(...) seems to be the slowest - unsatisfactory slow for large input
strings/heavy system workload;

// ? sec (unfinished) for 20,000,000 iterations
Regex rx = new Regex("|");
count = rx.Matches(s).Count;  

- char array iteration using char index and (XOR or char comparison) gives
the fastest results and is 100% safe from memory consumption point of
view....

//  * using XOR:

// ~3 sec for 20,000,000 iterations
for (int index = 0; index < s.Length; index++) 
    if ((s[index] == '|')) count++;

//  * using char comparinson...

// ~3 sec for 20,000,000 iterations
for (int index = 0; index < s.Length; index++) 
 if ((s[index] == '|')) count++;

Please correct me if you'll find the above results have mistakes...

Any takers to find quiker code for .NET VB or C# or C++/CLI? - that would be
a good weekend exercise on code optimization techniques...

Thank you. 

--
Shamil
 

-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Max Wanadoo
Sent: Sunday, April 13, 2008 1:14 PM
To: 'Discussion concerning Visual Basic and related programming issues.'
Subject: Re: [dba-VB] Count a character in a string

John,

Further to my example RegExpr code that I just posted.
Would you have time to compare how long it takes and how long these two take
(posted by others):


> One:
> Dim strRecordLine() as array
> strRecordline=split(yourstringhere,"|")
> NumberOfSeperatorCharacters=UBound(strRecordline)+1
> 
> Two:
> Dim strTemp as String
> strTemp=Replace(yourstring,"|","")
> #ofCharacters=Len(yourstring)-len(strTemp)

It would be really nice to get a handle on which of these is the faster for
a very large file.

Thanks

Max

_______________________________________________
dba-VB mailing list
dba-VB at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-vb
http://www.databaseadvisors.com




More information about the dba-VB mailing list