Shamil Salakhetdinov
shamil at users.mns.ru
Sat Sep 29 02:50:59 CDT 2007
Hi All, Here is some more weekend coding for the subject task: two different not-RegEx based strategies, which result in about 3 times execution time difference: Project -> Build -> Allow Unsafe Code Project -> Build -> Optimize Code // 1st case - not optimized by C# compiler code + StringJammerTestConsoleApplication.StringJammer1 1000000 cycles started at 11:26:21 - StringJammerTestConsoleApplication.StringJammer1 1000000 cycles finished at 11:26:50 + StringJammerTestConsoleApplication.StringJammer2 1000000 cycles started at 11:26:50 - StringJammerTestConsoleApplication.StringJammer2 1000000 cycles finished at 11:26:59 // 2nd case - optimized by C# compiler code + StringJammerTestConsoleApplication.StringJammer1 1000000 cycles started at 11:29:47 - StringJammerTestConsoleApplication.StringJammer1 1000000 cycles finished at 11:30:15 + StringJammerTestConsoleApplication.StringJammer2 1000000 cycles started at 11:30:15 - StringJammerTestConsoleApplication.StringJammer2 1000000 cycles finished at 11:30:22 And here is the code (watch line wraps for string constants! All the rest wrapped code lines should be OK because of semicolon and curved brackets used in C# - another advantage of posting C# code): To run it from VS IDE set: Project -> Build -> Allow Unsafe Code Project -> Build -> Optimize Code If anybody has time to write and test run RegEx equivalent or any other variants of "strings jamming/camel casing" code that would be very interesting to see here - I'd not get surprised if there could be even more speedy ways to jam strings without switching to C++ and assembler. And my current hypothesis based on the article I referenced here yesterday is that RegEx should run (considerably) slower: it would be great if this hypothesis will fail. Thanks. ------ cut here ---- using System; using System.Collections.Generic; using System.Text; namespace StringJammerTestConsoleApplication { /// <summary> /// StringJammer abstract class /// </summary> public abstract class StringJammer { protected static byte[] sieve = new byte[255]; protected void Init() { sieve.Initialize(); for (int i = (int)'A'; i <= (int)'Z'; i++) sieve[i] = 1; for (int i = (int)'a'; i <= (int)'z'; i++) sieve[i] = 1; } public abstract void Jam(ref string stringToJam); } /// <summary> /// StringJammer1 class - first string jamming strategy /// </summary> public class StringJammer1 : StringJammer { public StringJammer1() { Init(); } public override void Jam(ref string stringToJam) { StringBuilder result = new StringBuilder(stringToJam.Length); bool upperCase = true; foreach (char c in stringToJam.ToCharArray()) { if (sieve[(int)c] == 0) upperCase = true; else if (upperCase) { result.Append(c.ToString().ToUpper()); upperCase = false; } else result.Append(c.ToString().ToLower()); } stringToJam = result.ToString().Trim(); } } /// <summary> /// StringJammer2 class - second string jamming strategy /// </summary> public class StringJammer2 : StringJammer { public StringJammer2() { Init(); } public override void Jam(ref string stringToJam) { unsafe { fixed (char* c = stringToJam) { bool upperCase = true; int i = 0, j = 0; while (i < stringToJam.Length) { if (sieve[c[i]] == 0) { c[i++] = ' '; upperCase = true; } else if (upperCase) { c[j++] = Char.ToUpper(c[i++]); upperCase = false; } else { c[j++] = Char.ToLower(c[i++]); } } while (j < stringToJam.Length) c[j++] = ' '; } stringToJam = stringToJam.Trim(); } } } /// <summary> /// Test /// </summary> class Program { static void Main(string[] args) { const long MAX_CYCLES = 1000000; string[] test = {"John colby ", "%idiotic_Field*name&!@", " # hey#hey#Hey,hello_world$%#", "@#$this#is_a_test_of_the-emergency-broadcast-system $()# "}; StringJammer[] jummers = { new StringJammer1(), new StringJammer2() }; for (int k = 0; k < jummers.Length; k++) { long cyclesQty = MAX_CYCLES; Console.WriteLine("+ {0} {1:D} cycles started at {2}", jummers[k].GetType().ToString(), MAX_CYCLES, DateTime.Now.ToLongTimeString()); while (cyclesQty > 0) { for (int i = 0; i < test.Length; i++) { string result = new StringBuilder(test[i]).ToString(); jummers[k].Jam(ref result); if (cyclesQty == MAX_CYCLES) { Console.WriteLine(test[i] + " => {" + result + "}"); } } --cyclesQty; } Console.WriteLine("- {0} {1:D} cycles finished at {2}\n", jummers[k].GetType().ToString(), MAX_CYCLES, DateTime.Now.ToLongTimeString()); } } } } ------ cut here ---- -- Shamil >> Folks, >> >> I am looking for a regex expression (preferably with explanation) for >> taking an expression and creating a camel case (or PascalCase) >> expression. >> >> I get CSV files with headers in them. All too often the eejits that >> created the databases they came from used embedded spaces or other >> special use characters (!@#$%^&* etc) in their field names. I need to >> strip these special characters out completely. I also need to upper >> case the valid alpha character that follows any of these special >> characters. >> >> John colby becomes JohnColby >> %idiotic_Field*name becomes IdioticFieldName >> >> Etc. >> >> It appears that Regex is the key (I am doing this in VB.Net) but until >> today I have never really tried to use RegEx and it ain't pretty! >> >> Any help in this would be much appreciated. >> >> John W. Colby >> Colby Consulting >> www.ColbyConsulting.com >>