Shamil Salakhetdinov
shamil at users.mns.ru
Sun Apr 13 05:08:53 CDT 2008
Hi Max,
Our postings crossed - I have just posted RegEx test results...
N.B.: when testing this or that method both memory consumption and execution
speed should be taken into consideration:
- high memory consumption giving speediest results can be neglected to solve
small tasks within "tiny" applications/utilities, but
- high memory consumption is the thing, which could "unexpectedly" give
nasty side effects in the large application systems:
One example I have got several months ago, which was so "tricky" to watch,
and was not easy to find why it could ever happen in .NET: I have got a
memory leakage/excessive memory consumption (in .NET!) because of "lazy
loading", and that latter "lazy loading" loaded quite some data when it
wasn't needed, in the cycle processing millions of records - the results
were like that when I watched 'Page File Usage History' in Task Manager:
/| /| /| .
/ | / | / | .
/ | / | / | .
/ |/ |/ |/
And in usual mode without heavy application system workload everything
worked well with the same data...
Recap:
- Split(...) approach could result in similar to the above side effect for
very large input strings;
// 10 sec for 20,000,000 iterations
string[] recordLine = s.Split('|');
count = recordLine.Length-1;
- Replace(...) could also result in the above side effect;
// 10 sec for 20,000,000 iterations
string temp = s.Replace("|",
Microsoft.VisualBasic.Constants.vbNullString);
count = s.Length - temp.Length;
- RegEx(...) seems to be the slowest - unsatisfactory slow for large input
strings/heavy system workload;
// ? sec (unfinished) for 20,000,000 iterations
Regex rx = new Regex("|");
count = rx.Matches(s).Count;
- char array iteration using char index and (XOR or char comparison) gives
the fastest results and is 100% safe from memory consumption point of
view....
// * using XOR:
// ~3 sec for 20,000,000 iterations
for (int index = 0; index < s.Length; index++)
if ((s[index] == '|')) count++;
// * using char comparinson...
// ~3 sec for 20,000,000 iterations
for (int index = 0; index < s.Length; index++)
if ((s[index] == '|')) count++;
Please correct me if you'll find the above results have mistakes...
Any takers to find quiker code for .NET VB or C# or C++/CLI? - that would be
a good weekend exercise on code optimization techniques...
Thank you.
--
Shamil
-----Original Message-----
From: dba-vb-bounces at databaseadvisors.com
[mailto:dba-vb-bounces at databaseadvisors.com] On Behalf Of Max Wanadoo
Sent: Sunday, April 13, 2008 1:14 PM
To: 'Discussion concerning Visual Basic and related programming issues.'
Subject: Re: [dba-VB] Count a character in a string
John,
Further to my example RegExpr code that I just posted.
Would you have time to compare how long it takes and how long these two take
(posted by others):
> One:
> Dim strRecordLine() as array
> strRecordline=split(yourstringhere,"|")
> NumberOfSeperatorCharacters=UBound(strRecordline)+1
>
> Two:
> Dim strTemp as String
> strTemp=Replace(yourstring,"|","")
> #ofCharacters=Len(yourstring)-len(strTemp)
It would be really nice to get a handle on which of these is the faster for
a very large file.
Thanks
Max
_______________________________________________
dba-VB mailing list
dba-VB at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-vb
http://www.databaseadvisors.com