Gustav Brock
Gustav at cactus.dk
Mon Oct 1 07:11:06 CDT 2007
Hi Max No, they are not identical. Here is the corrected version which now runs slightly faster: Function CamelCaseBestSoFar4VBAPoking() ' ' This times at 4 mins 16 seconds ' This times at 3 mins 40 seconds Dim tStartTime As Date, tEndTime As Date, tLapsedTime As Date, iLoop As Long, iVarLoop As Integer Dim iLenLoop As Integer, iVars As Integer Dim bFlipCase As Boolean, str2Parse As String, strResult As String, strBit As String Dim varStr(5), lngPos As Long varStr(1) = "John colby " ' Result wanted: "JohnColby" varStr(2) = "%idiotic_Field*name&!@" ' Result wanted: "IdioticFieldName" varStr(3) = " # hey#hey#Hey,hello_world$%#" ' Result wanted: "HeyHeyHeyHelloWorld" varStr(4) = "@#$this#is_a_test_of_the-emerGency-broadcast-system" ' Result wanted: "ThisIsATestOftheEmergencyBroadcastSystem" varStr(5) = "thisisastringwithnobadchars" ' Result wanted: "Thisisastringwithnobadchars" iVars = 5 tStartTime = Now For iLoop = 1 To conIterations For iVarLoop = 1 To iVars ' str2Parse = varStr(iVarLoop): bFlipCase = True: strResult = Left(Space(255), Len(str2Parse)): lngPos = 0 str2Parse = LCase(varStr(iVarLoop)): bFlipCase = True: strResult = Space(Len(str2Parse)): lngPos = 0 For iLenLoop = 1 To Len(str2Parse) ' strBit = LCase(Mid(str2Parse, iLenLoop, 1)) strBit = Mid(str2Parse, iLenLoop, 1) If InStr(conBadChars, strBit) = 0 Then lngPos = lngPos + 1 If bFlipCase = True Then strBit = UCase(strBit): bFlipCase = False Mid(strResult, lngPos, 1) = strBit Else bFlipCase = True End If Next iLenLoop strResult = Trim(strResult) ' drop any spaces left in string 'Debug.Print strResult Next iVarLoop Next iLoop tEndTime = Now tLapsedTime = tEndTime - tStartTime 'MsgBox tLapsedTime: Debug.Print tLapsedTime End Function /gustav >>> max.wanadoo at gmail.com 01-10-2007 12:34 >>> Hi Gustav, I have tried your version of poking the values into a string using the MID. The results is 31 seconds LONGER than the previous verions. Both version are below. Identical (I think) apart from the poking bit. Max Option Compare Database Option Explicit Const conBadChars As String = "!£$%^&*()_-+@'#~?><|\, " ' space also in this string Const conIterations As Long = 1000000 ' one million iterations Function CamelCaseBestSoFar4VBAPoking() ' This times at 4 mins 16 seconds Dim tStartTime As Date, tEndTime As Date, tLapsedTime As Date, iLoop As Long, iVarLoop As Integer Dim iLenLoop As Integer, iVars As Integer Dim bFlipCase As Boolean, str2Parse As String, strResult As String, strBit As String Dim varStr(5), lngPos As Long varStr(1) = "John colby " ' Result wanted: "JohnColby" varStr(2) = "%idiotic_Field*name&!@" ' Result wanted: "IdioticFieldName" varStr(3) = " # hey#hey#Hey,hello_world$%#" ' Result wanted: "HeyHeyHeyHelloWorld" varStr(4) = "@#$this#is_a_test_of_the-emerGency-broadcast-system" ' Result wanted: "ThisIsATestOftheEmergencyBroadcastSystem" varStr(5) = "thisisastringwithnobadchars" ' Result wanted: "Thisisastringwithnobadchars" iVars = 5 tStartTime = Now For iLoop = 1 To conIterations For iVarLoop = 1 To iVars str2Parse = varStr(iVarLoop): bFlipCase = True: strResult = Left(Space(255), Len(str2Parse)): lngPos = 0 For iLenLoop = 1 To Len(str2Parse) strBit = LCase(Mid(str2Parse, iLenLoop, 1)) If InStr(conBadChars, strBit) = 0 Then lngPos = lngPos + 1 If bFlipCase = True Then strBit = UCase(strBit): bFlipCase = False Mid(strResult, lngPos, 1) = strBit Else bFlipCase = True End If Next iLenLoop strResult = Trim(strResult) ' drop any spaces left in string 'Debug.Print strResult Next iVarLoop Next iLoop tEndTime = Now tLapsedTime = tEndTime - tStartTime 'MsgBox tLapsedTime: Debug.Print tLapsedTime End Function Function CamelCaseBestSoFar4VBA() ' This times at 3 mins 45 seconds Dim tStartTime As Date, tEndTime As Date, tLapsedTime As Date, iLoop As Long, iVarLoop As Integer Dim iLenLoop As Integer, iVars As Integer Dim bFlipCase As Boolean, str2Parse As String, strResult As String, strBit As String Dim varStr(5) varStr(1) = "John colby " ' Result wanted: "JohnColby" varStr(2) = "%idiotic_Field*name&!@" ' Result wanted: "IdioticFieldName" varStr(3) = " # hey#hey#Hey,hello_world$%#" ' Result wanted: "HeyHeyHeyHelloWorld" varStr(4) = "@#$this#is_a_test_of_the-emerGency-broadcast-system" ' Result wanted: "ThisIsATestOftheEmergencyBroadcastSystem" varStr(5) = "thisisastringwithnobadchars" ' Result wanted: "Thisisastringwithnobadchars" iVars = 5 tStartTime = Now For iLoop = 1 To conIterations For iVarLoop = 1 To iVars str2Parse = LCase(varStr(iVarLoop)): bFlipCase = True: strResult = "" For iLenLoop = 1 To Len(str2Parse) strBit = Mid(str2Parse, iLenLoop, 1) If InStr(conBadChars, strBit) = 0 Then If bFlipCase = True Then strBit = UCase(strBit): bFlipCase = False strResult = strResult & strBit Else bFlipCase = True End If Next iLenLoop 'Debug.Print strResult Next iVarLoop Next iLoop tEndTime = Now tLapsedTime = tEndTime - tStartTime 'MsgBox tLapsedTime: Debug.Print tLapsedTime End Function -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Gustav Brock Sent: Monday, October 01, 2007 9:57 AM To: accessd at databaseadvisors.com Subject: Re: [AccessD] Use Regex - Create Camel Case Hi Max and Shamil Max, the main reason for VBA running slow on string concatenation is this construct: strResult = strResult & strBit Indeed for long strings (some 10K) this is so slow that it is hard to believe. The traditional work-around is to create a dummy target string and then replace the chars one by one using Mid(). Here's an example (though path/file names seldom are so long that it matters): Public Function TrimFileName( _ ByVal strFileName As String) _ As String ' Replaces characters in strFileName that are ' not allowed by Windows as a file name. ' Truncates length of strFileName to clngFileNameLen. ' ' 2000-12-07. Gustav Brock, Cactus Data ApS, Copenhagen ' 2002-05-22. Replaced string concatenating with Mid(). ' No special error handling. On Error Resume Next ' String containing all not allowed characters. Const cstrInValidChars As String = "\/:*?""<>|" ' Replace character for not allowed characters. Const cstrReplaceChar As String * 1 = "-" ' Maximum length of a file name. Const clngFileNameLen As Long = 255 Dim lngLen As Long Dim lngPos As Long Dim strChar As String Dim strTrim As String ' Strip leading and trailing spaces. strTrim = Left(Trim(strFileName), clngFileNameLen) lngLen = Len(strTrim) For lngPos = 1 To lngLen Step 1 strChar = Mid(strTrim, lngPos, 1) If InStr(cstrInValidChars, strChar) > 0 Then Mid(strTrim, lngPos) = cstrReplaceChar End If Next TrimFileName = strTrim End Function Shamil, I have not been working with this in C# but I think you are on the right track using arrays. I hope to find some time to experiment with your code examples. As we all know, for validation of a user input in a textbox, speed is of zero practical importance, but from time to time your task is to manipulate not one but thousands of strings and then it matters. /gustav >>> max.wanadoo at gmail.com 30-09-2007 10:52 >>> Hi Shamil, Clearly your compiled solution is by way and far the quickest solution. I have tried all sorts of VBA solutions including looking at XOR, IMP, EQV, bitwise solutions, but there overheads were considerable. The best I can come up with in VBA is below. One million iterations on my Dell Inspiron comes in at 3 min 52 secs. If John didn't want to Hump it, then RegExpr appears to be the answer within pure VBA Max Function dbc2() Const conGoodChars As String = "abcdefghijklmnopqrstuvwxyz" ' valid characters Const conBadChars As String = "£$%^&*()_-+@'#~?><|\, " ' space also in this string Const conLoops As Long = 1000000 Dim tStartTime As Date, tEndTime As Date, tLapsedTime As Date, iLoop As Long, iVars As Integer, iVarLoop As Integer Dim iLen As Integer, strTemp As String, bFlipCase As Boolean, str2Parse As String, strResult As String, strBit As String Dim varStr(5) varStr(1) = "John colby " varStr(2) = "%idiotic_Field*name&!@" varStr(3) = " # hey#hey#Hey,hello_world$%#" varStr(4) = "@#$this#is_a_test_of_the-emerGency-broadcast-system" varStr(5) = "thisisastringwithnobadchars" iVars = 5 tStartTime = Now For iLoop = 1 To conLoops For iVarLoop = 1 To iVars strResult = "" str2Parse = LCase(varStr(iVarLoop)) str2Parse = UCase(Left(str2Parse, 1)) & Mid(str2Parse, 2) For iLen = 1 To Len(str2Parse) strBit = Mid(str2Parse, iLen, 1) If InStr(conBadChars, strBit) = 0 Then If bFlipCase = True Then strBit = UCase(strBit): bFlipCase = False strResult = strResult & strBit Else bFlipCase = True End If Next iLen 'Debug.Print strResult Next iVarLoop Next iLoop tEndTime = Now tLapsedTime = tEndTime - tStartTime MsgBox tLapsedTime: Debug.Print tLapsedTime End Function -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Shamil Salakhetdinov Sent: Sunday, September 30, 2007 9:01 AM To: 'Access Developers discussion and problem solving' Subject: Re: [AccessD] Use Regex - Create Camel Case <<< However for more complicated string operations like validating an email address, a regex would be very suitable and doable in one line vs. many, many lines the other way. >>> Hi Mike, That's clear, and the John's task is to get the speediest solution. -- Shamil -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Michael Bahr Sent: Sunday, September 30, 2007 6:42 AM To: Access Developers discussion and problem solving Subject: Re: [AccessD] Use Regex - Create Camel Case Hi Shamil, yes regex's are slower in .Net due to I believe all the objects overhead. For simple string operations regexes would probrably not be effiecent BUT would be easier to write. However for more complicated string operations like validating an email address, a regex would be very suitable and doable in one line vs. many, many lines the other way. Mike... > Hi All, > > I wanted to note: I have seen somewhere an article about RegEx being > considerably slower than a mere strings comparison etc. I cannot find > this article now, can you? > > Here is a similar article on ColdFusion and Java (watch line wraps) - > > http://www.bennadel.com/blog/410-Regular-Expression-Finds-vs-String-Finds.ht > m > > The info above should be also true for C#/VB.NET (just remember there > are no miracles in this world)... > > John, this could be critical information for you because of your > computers processing zillion gigabytes of data - if that slowness of > RegEx vs. > string > comparison operation proves to be true then mere chars/strings > comparison and simple iteration over source string's char array could > be the most effective solution, which will save you hours and hours of computing time: > > - define a 256 bytes long table (I guess you use extended ASCII (256 > chars > max) only John - right?) with to be stripped out chars marked by 1; > - define upperCase flag; > - allocate destination string, which is as long as the source one - > use StringBuilder; > - iterate source string and use current char's ASCII code as an index > of a cell of array mentioned above: > a) if the array's cell has value > 0 then the source char should > be stripped out/skipped; set uppercase flag = true; > b) if the array's cell has zero value and uppercase flag = true > then uppercase current source char and copy it to the destination > StringBuilder's; set uppercase flag = false; > c) if the array's cell has zero value and uppercase flag = false > then lower case current source char and copy it to the destination > StringBuilder's string; > > > Here is C# code: > > > private static string[] delimiters = " > |%|*|$|@|!|#|&|^|_|-|,|.|;|:|(|)".Split('|'); > private static byte[] sieve = new byte[255]; private static bool > initialized = false; static void JamOutBadChars() { if (!initialized) > { > sieve.Initialize(); > foreach (string delimiter in delimiters) > { > sieve[(int)delimiter.Substring(0, 1).ToCharArray()[0]] = 1; > } > initialized = true; > } > string[] test = {"John colby ", > "%idiotic_Field*name&!@", > " # hey#hey#Hey,hello_world$%#", > "@#$this#is_a_test_of_the-emergency-broadcast-system "}; > > foreach (string source in test) > { > StringBuilder result = new StringBuilder(source.Length); > bool upperCase = true; > foreach (char c in source.ToCharArray()) > { > if (sieve[(int)c] > 0) upperCase = true; > else if (upperCase) > { > result.Append(c.ToString().ToUpper()); > upperCase = false; > } > else result.Append(c.ToString().ToLower()); > } > Console.WriteLine(source + " => {" + result + "}"); } } > > -- > Shamil > > > -----Original Message----- > From: accessd-bounces at databaseadvisors.com > [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Michael > Bahr > Sent: Friday, September 28, 2007 10:25 PM > To: Access Developers discussion and problem solving > Subject: Re: [AccessD] Use Regex - Create Camel Case > > Hi John, here is one way to do it (although there are many ways to get > the same end result). Mind you this is air code but hopefully should > be enough to get you going. You will need to create the main loop > within your code. > > Create a list of all delimiters that are used in your CSV files such > as delimiters = '%|*|$|@|!|#|&|^|_|-|,|.|;|:| ' > > then run through your CSV files line by line evaluating the line > saving the line into an array thisarray = Split(line, delimiters) > > then run through the array performing a Ucase on the first letter of > each word newline = "" > For item=1 to ubound > newline = newline & whatEverToCapFirstChar(item) Next item > > where ubound is the array size > > > Now here are two scripts that do the same thing, one is Perl and the > other is TCL. Both of these languages are open source and free and > can be gotten at http://www.activestate.com/Products/languages.plex > > Perl: > > my $delimiters = '/:| |\%|\*|\$|\@|\!|\#|\&|^|_|-|,|\./'; > my @test = ("John colby", > "%idiotic_Field*name", > "hey#hey#Hey,hello_world", > "this#is_a_test_of_the-emergency-broadcast-system"); > > foreach my $item (@test) { > my $temp = ""; > my @list = split ($delimiters, $item); > foreach my $thing (@list) { > $temp .= ucfirst($thing); > } > print "$temp\n"; > > } > > Result > d:\Perl>pascalcase.pl > JohnColby > IdioticFieldName > HeyHeyHeyHelloWorld > ThisIsATestOfTheEmergencyBroadcastSystem > > TCL: > > set delimiters {%|*|$|@|!|#|&|^|_|-|,|.|;|:|\ "} set test [list {John > colby} {%idiotic_Field*name} {hey#hey#Hey,hello_world} > {this#is_a_test_of_the-emergency-broadcast-system}] > > > foreach item $test { > set str "" > set mylist [split $item, $delimiters] > foreach thing $mylist { > set s [string totitle $thing] > set str "$str$s" > } > puts $str > > } > > Results > D:\VisualTcl\Projects>tclsh pascalcase.tcl JohnColby IdioticFieldName > HeyHeyHeyHelloWorld ThisIsATestOfTheEmergencyBroadcastSystem > > > hth, Mike...