Shamil Salakhetdinov
shamil at users.mns.ru
Sat Sep 29 02:50:59 CDT 2007
Hi All,
Here is some more weekend coding for the subject task: two different
not-RegEx based strategies, which result in about 3 times execution time
difference:
Project -> Build -> Allow Unsafe Code
Project -> Build -> Optimize Code
// 1st case - not optimized by C# compiler code
+ StringJammerTestConsoleApplication.StringJammer1
1000000 cycles started at 11:26:21
- StringJammerTestConsoleApplication.StringJammer1
1000000 cycles finished at 11:26:50
+ StringJammerTestConsoleApplication.StringJammer2
1000000 cycles started at 11:26:50
- StringJammerTestConsoleApplication.StringJammer2
1000000 cycles finished at 11:26:59
// 2nd case - optimized by C# compiler code
+ StringJammerTestConsoleApplication.StringJammer1
1000000 cycles started at 11:29:47
- StringJammerTestConsoleApplication.StringJammer1
1000000 cycles finished at 11:30:15
+ StringJammerTestConsoleApplication.StringJammer2
1000000 cycles started at 11:30:15
- StringJammerTestConsoleApplication.StringJammer2
1000000 cycles finished at 11:30:22
And here is the code (watch line wraps for string constants! All the rest
wrapped code lines should be OK because of semicolon and curved brackets
used in C# - another advantage of posting C# code):
To run it from VS IDE set:
Project -> Build -> Allow Unsafe Code
Project -> Build -> Optimize Code
If anybody has time to write and test run RegEx equivalent or any other
variants of "strings jamming/camel casing" code that would be very
interesting to see here - I'd not get surprised if there could be even more
speedy ways to jam strings without switching to C++ and assembler.
And my current hypothesis based on the article I referenced here yesterday
is that RegEx should run (considerably) slower: it would be great if this
hypothesis will fail. Thanks.
------ cut here ----
using System;
using System.Collections.Generic;
using System.Text;
namespace StringJammerTestConsoleApplication
{
/// <summary>
/// StringJammer abstract class
/// </summary>
public abstract class StringJammer
{
protected static byte[] sieve = new byte[255];
protected void Init()
{
sieve.Initialize();
for (int i = (int)'A'; i <= (int)'Z'; i++)
sieve[i] = 1;
for (int i = (int)'a'; i <= (int)'z'; i++)
sieve[i] = 1;
}
public abstract void Jam(ref string stringToJam);
}
/// <summary>
/// StringJammer1 class - first string jamming strategy
/// </summary>
public class StringJammer1 : StringJammer
{
public StringJammer1() { Init(); }
public override void Jam(ref string stringToJam)
{
StringBuilder result = new StringBuilder(stringToJam.Length);
bool upperCase = true;
foreach (char c in stringToJam.ToCharArray())
{
if (sieve[(int)c] == 0) upperCase = true;
else if (upperCase)
{
result.Append(c.ToString().ToUpper());
upperCase = false;
}
else result.Append(c.ToString().ToLower());
}
stringToJam = result.ToString().Trim();
}
}
/// <summary>
/// StringJammer2 class - second string jamming strategy
/// </summary>
public class StringJammer2 : StringJammer
{
public StringJammer2() { Init(); }
public override void Jam(ref string stringToJam)
{
unsafe
{
fixed (char* c = stringToJam)
{
bool upperCase = true;
int i = 0, j = 0;
while (i < stringToJam.Length)
{
if (sieve[c[i]] == 0)
{
c[i++] = ' ';
upperCase = true;
}
else if (upperCase)
{
c[j++] = Char.ToUpper(c[i++]); upperCase =
false;
}
else
{
c[j++] = Char.ToLower(c[i++]);
}
}
while (j < stringToJam.Length) c[j++] = ' ';
}
stringToJam = stringToJam.Trim();
}
}
}
/// <summary>
/// Test
/// </summary>
class Program
{
static void Main(string[] args)
{
const long MAX_CYCLES = 1000000;
string[] test = {"John colby ",
"%idiotic_Field*name&!@",
" # hey#hey#Hey,hello_world$%#",
"@#$this#is_a_test_of_the-emergency-broadcast-system
$()# "};
StringJammer[] jummers = { new StringJammer1(), new
StringJammer2() };
for (int k = 0; k < jummers.Length; k++)
{
long cyclesQty = MAX_CYCLES;
Console.WriteLine("+ {0} {1:D} cycles started at {2}",
jummers[k].GetType().ToString(),
MAX_CYCLES, DateTime.Now.ToLongTimeString());
while (cyclesQty > 0)
{
for (int i = 0; i < test.Length; i++)
{
string result = new
StringBuilder(test[i]).ToString();
jummers[k].Jam(ref result);
if (cyclesQty == MAX_CYCLES)
{
Console.WriteLine(test[i] + " => {" + result +
"}");
}
}
--cyclesQty;
}
Console.WriteLine("- {0} {1:D} cycles finished at {2}\n",
jummers[k].GetType().ToString(),
MAX_CYCLES, DateTime.Now.ToLongTimeString());
}
}
}
}
------ cut here ----
--
Shamil
>> Folks,
>>
>> I am looking for a regex expression (preferably with explanation) for
>> taking an expression and creating a camel case (or PascalCase)
>> expression.
>>
>> I get CSV files with headers in them. All too often the eejits that
>> created the databases they came from used embedded spaces or other
>> special use characters (!@#$%^&* etc) in their field names. I need to
>> strip these special characters out completely. I also need to upper
>> case the valid alpha character that follows any of these special
>> characters.
>>
>> John colby becomes JohnColby
>> %idiotic_Field*name becomes IdioticFieldName
>>
>> Etc.
>>
>> It appears that Regex is the key (I am doing this in VB.Net) but until
>> today I have never really tried to use RegEx and it ain't pretty!
>>
>> Any help in this would be much appreciated.
>>
>> John W. Colby
>> Colby Consulting
>> www.ColbyConsulting.com
>>