Jürgen Welz
jwelz at hotmail.com
Mon May 31 15:53:37 CDT 2004
Line by line is ok if you're certain no tokens are split between lines. I once had to parse several hundred email log files from a news group to cull data for an access database. Each file was typically 300 to 800 K in size. I started by reading entire files into a string variable and then running the string manipulation code to parse but quickly determined that if I split the files into halves, each half took less than half the time to parse. I then split the file by halves, and halved the halves in a binary splitting procedure breaking on a space until the files were down to about 5 K. In that case, the time taken for splitting the string into quickly digested chunks was adding more time to the procedure than it saved. A bit of testing ultimately proved that a file size of around 7 to 8 K resulted in the best overall performance with the typical kind of data that needed to be processed. If I were to do it over again, I might be tempted to read in chunks at a time, finding the last delimiter and saving from there on in a temp variable to prepend to the next chunk to be read. I'm not certain that a line at a time will give the best performance but I do know from experience that excessively large strings take exponentially longer to process. The time for parsing a typical single file went from something like 5 to 10 minutes to under 30 seconds by splitting into smaller chunks before processing. Although VBA string variables can hold a a billion characters, that would probably take insanely long to process as a single string. If your text files are only a few hundred characters, I'd try timing the token operation on the full files. Ciao Jürgen Welz Edmonton, Alberta jwelz at hotmail.com >From: "Christopher Hawkins" <clh at christopherhawkins.com> >Reply-To: Access Developers discussion and problem >solving<accessd at databaseadvisors.com> >To: accessd at databaseadvisors.com >Subject: [AccessD] Replacing multiple tokens in a text file? >Date: Mon, 31 May 2004 12:19:57 -0600 >MIME-Version: 1.0 >Received: from mc11-f26.hotmail.com ([65.54.167.33]) by mc11-s5.hotmail.com >with Microsoft SMTPSVC(5.0.2195.6824); Mon, 31 May 2004 11:21:49 -0700 >Received: from databaseadvisors.com ([209.135.140.44]) by >mc11-f26.hotmail.com with Microsoft SMTPSVC(5.0.2195.6824); Mon, 31 May >2004 11:20:39 -0700 >Received: from databaseadvisors.com (databaseadvisors.com >[209.135.140.44])by databaseadvisors.com (8.11.6/8.11.6) with ESMTP id >i4VIJCQ01892;Mon, 31 May 2004 13:19:12 -0500 >Received: from mail-relay.gearhost.com (ns2.co.gearhost.net >[69.24.64.15])by databaseadvisors.com (8.11.6/8.11.6) with ESMTP id >i4VIIrQ01669for <accessd at databaseadvisors.com>; Mon, 31 May 2004 13:18:53 >-0500 >Received: from mail.gearhost.net ([69.24.64.25])by mail-relay.gearhost.com >(mail-relay.gearhost.com)(MDaemon.PRO.v7.1.0.R) with ESMTP id >md50000450849.msgfor <accessd at databaseadvisors.com>; Mon, 31 May 2004 >12:20:01 -0600 >Received: from christopherhawkins.com (unverified [127.0.0.1]) >bymail.gearhost.net(Rockliffe SMTPRA 4.5.6) with ESMTP id ><B0018590842 at mail.gearhost.net>for <accessd at databaseadvisors.com>; Mon, 31 >May 2004 12:19:57 -0600 >X-Message-Info: 1fLmhUU0vWFvdH+J6tlz6F85W0zaUsn6uS5jh9M9uj4= >Message-ID: <157240-220045131181957850 at christopherhawkins.com> >X-EM-Version: 6, 0, 0, 6 >X-EM-Registration: #00E0620610781F002A20 >X-Spam-Processed: mail-relay.gearhost.com, Mon, 31 May 2004 12:20:01 >-0600(not processed: spam filter disabled) >X-MDRemoteIP: 69.24.64.25 >X-Return-Path: clh at christopherhawkins.com >X-MDaemon-Deliver-To: accessd at databaseadvisors.com >X-BeenThere: accessd at databaseadvisors.com >X-Mailman-Version: 2.1.4 >Precedence: list >List-Id: Access Developers discussion and problem >solving<accessd.databaseadvisors.com> >List-Help: <mailto:accessd-request at databaseadvisors.com?subject=help> >List-Post: <mailto:accessd at databaseadvisors.com> >List-Subscribe: ><http://databaseadvisors.com/mailman/listinfo/accessd>,<mailto:accessd-request at databaseadvisors.com?subject=subscribe> >List-Archive: <http://databaseadvisors.com/pipermail/accessd> >List-Unsubscribe: ><http://databaseadvisors.com/mailman/listinfo/accessd>,<mailto:accessd-request at databaseadvisors.com?subject=unsubscribe> >Errors-To: accessd-bounces at databaseadvisors.com >Return-Path: accessd-bounces at databaseadvisors.com >X-OriginalArrivalTime: 31 May 2004 18:20:39.0809 (UTC) >FILETIME=[FA1D6710:01C4473B] > >I suppose the title says it all. ;) > >Given a text file with numerous (all different) tokens in it, how >would I replace them? I mean, I know how to replace tokens in a >string, but I've never had to use a text file for a string before. > >Should I stuff the entire text file into an array and wash it through >my detokenizing code until no more tokens are found? > >Should I read the file line-by-line, replacing tokens as I go? > >I'm just not sure how to get started here. > >-Christopher- > > >-- >_______________________________________________ >AccessD mailing list >AccessD at databaseadvisors.com >http://databaseadvisors.com/mailman/listinfo/accessd >Website: http://www.databaseadvisors.com _________________________________________________________________ MSN Premium with Virus Guard and Firewall* from McAfee® Security : 2 months FREE* http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=http://hotmail.com/enca&HL=Market_MSNIS_Taglines