DWUTKA at marlow.com
DWUTKA at marlow.com
Tue Jun 1 11:27:09 CDT 2004
Just an FYI, I've done a lot of 'playing around' with text files, and one of the issues I have noted is that it's not the size of the text file, or the string, that is the real speed issue. It's the joining of a string, or parsing. For example, I recently had a little project where the users wanted to parse out to a pipe delimited text file, various fields from a table. I whipped up a little VB code, and kicked it off. I ran it on the entire system, which had abou 65k records. It started doing about a thousand a second, and then began to slow down. It kept slowing down, down to 10 records a second (or so) around 30k. The speed issue was this: strTemp=strTemp & rs.Fields(i).value What is causing the delay, is that you are taking XXXXXXXXXXXXXXXXXXXXXX and setting it to XXXXXXXXXXXXXXXX+YYYY. As the string gets longer and longer, it is forcing the CPU to work overtime to process that line. It isn't being smart, and literally adding the one string to the end, it is literally 'rebuilding' the string over and over. As soon as I changed the code to write each line to file, instead of adding it to another string, it boosted up to about 2k a second, and stayed there through all 65k records. In the case of token replacement, I would think that Replace would would fine on the entire string, because I'm willing to bet that it is a faster string manipulation process then the roll your own types. I would think... Drew -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com]On Behalf Of Jürgen Welz Sent: Monday, May 31, 2004 3:54 PM To: accessd at databaseadvisors.com Subject: RE: [AccessD] Replacing multiple tokens in a text file? Line by line is ok if you're certain no tokens are split between lines. I once had to parse several hundred email log files from a news group to cull data for an access database. Each file was typically 300 to 800 K in size. I started by reading entire files into a string variable and then running the string manipulation code to parse but quickly determined that if I split the files into halves, each half took less than half the time to parse. I then split the file by halves, and halved the halves in a binary splitting procedure breaking on a space until the files were down to about 5 K. In that case, the time taken for splitting the string into quickly digested chunks was adding more time to the procedure than it saved. A bit of testing ultimately proved that a file size of around 7 to 8 K resulted in the best overall performance with the typical kind of data that needed to be processed. If I were to do it over again, I might be tempted to read in chunks at a time, finding the last delimiter and saving from there on in a temp variable to prepend to the next chunk to be read. I'm not certain that a line at a time will give the best performance but I do know from experience that excessively large strings take exponentially longer to process. The time for parsing a typical single file went from something like 5 to 10 minutes to under 30 seconds by splitting into smaller chunks before processing. Although VBA string variables can hold a a billion characters, that would probably take insanely long to process as a single string. If your text files are only a few hundred characters, I'd try timing the token operation on the full files. Ciao Jürgen Welz Edmonton, Alberta jwelz at hotmail.com >From: "Christopher Hawkins" <clh at christopherhawkins.com> >Reply-To: Access Developers discussion and problem >solving<accessd at databaseadvisors.com> >To: accessd at databaseadvisors.com >Subject: [AccessD] Replacing multiple tokens in a text file? >Date: Mon, 31 May 2004 12:19:57 -0600 >MIME-Version: 1.0 >Received: from mc11-f26.hotmail.com ([65.54.167.33]) by mc11-s5.hotmail.com >with Microsoft SMTPSVC(5.0.2195.6824); Mon, 31 May 2004 11:21:49 -0700 >Received: from databaseadvisors.com ([209.135.140.44]) by >mc11-f26.hotmail.com with Microsoft SMTPSVC(5.0.2195.6824); Mon, 31 May >2004 11:20:39 -0700 >Received: from databaseadvisors.com (databaseadvisors.com >[209.135.140.44])by databaseadvisors.com (8.11.6/8.11.6) with ESMTP id >i4VIJCQ01892;Mon, 31 May 2004 13:19:12 -0500 >Received: from mail-relay.gearhost.com (ns2.co.gearhost.net >[69.24.64.15])by databaseadvisors.com (8.11.6/8.11.6) with ESMTP id >i4VIIrQ01669for <accessd at databaseadvisors.com>; Mon, 31 May 2004 13:18:53 >-0500 >Received: from mail.gearhost.net ([69.24.64.25])by mail-relay.gearhost.com >(mail-relay.gearhost.com)(MDaemon.PRO.v7.1.0.R) with ESMTP id >md50000450849.msgfor <accessd at databaseadvisors.com>; Mon, 31 May 2004 >12:20:01 -0600 >Received: from christopherhawkins.com (unverified [127.0.0.1]) >bymail.gearhost.net(Rockliffe SMTPRA 4.5.6) with ESMTP id ><B0018590842 at mail.gearhost.net>for <accessd at databaseadvisors.com>; Mon, 31 >May 2004 12:19:57 -0600 >X-Message-Info: 1fLmhUU0vWFvdH+J6tlz6F85W0zaUsn6uS5jh9M9uj4= >Message-ID: <157240-220045131181957850 at christopherhawkins.com> >X-EM-Version: 6, 0, 0, 6 >X-EM-Registration: #00E0620610781F002A20 >X-Spam-Processed: mail-relay.gearhost.com, Mon, 31 May 2004 12:20:01 >-0600(not processed: spam filter disabled) >X-MDRemoteIP: 69.24.64.25 >X-Return-Path: clh at christopherhawkins.com >X-MDaemon-Deliver-To: accessd at databaseadvisors.com >X-BeenThere: accessd at databaseadvisors.com >X-Mailman-Version: 2.1.4 >Precedence: list >List-Id: Access Developers discussion and problem >solving<accessd.databaseadvisors.com> >List-Help: <mailto:accessd-request at databaseadvisors.com?subject=help> >List-Post: <mailto:accessd at databaseadvisors.com> >List-Subscribe: ><http://databaseadvisors.com/mailman/listinfo/accessd>,<mailto:accessd-requ est at databaseadvisors.com?subject=subscribe> >List-Archive: <http://databaseadvisors.com/pipermail/accessd> >List-Unsubscribe: ><http://databaseadvisors.com/mailman/listinfo/accessd>,<mailto:accessd-requ est at databaseadvisors.com?subject=unsubscribe> >Errors-To: accessd-bounces at databaseadvisors.com >Return-Path: accessd-bounces at databaseadvisors.com >X-OriginalArrivalTime: 31 May 2004 18:20:39.0809 (UTC) >FILETIME=[FA1D6710:01C4473B] > >I suppose the title says it all. ;) > >Given a text file with numerous (all different) tokens in it, how >would I replace them? I mean, I know how to replace tokens in a >string, but I've never had to use a text file for a string before. > >Should I stuff the entire text file into an array and wash it through >my detokenizing code until no more tokens are found? > >Should I read the file line-by-line, replacing tokens as I go? > >I'm just not sure how to get started here. > >-Christopher- > > >-- >_______________________________________________ >AccessD mailing list >AccessD at databaseadvisors.com >http://databaseadvisors.com/mailman/listinfo/accessd >Website: http://www.databaseadvisors.com _________________________________________________________________ MSN Premium with Virus Guard and Firewall* from McAfee® Security : 2 months FREE* http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=htt p://hotmail.com/enca&HL=Market_MSNIS_Taglines -- _______________________________________________ AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com