[AccessD] Replacing multiple tokens in a text file?

Mon May 31 15:53:37 CDT 2004

Line by line is ok if you're certain no tokens are split between lines.

I once had to parse several hundred email log files from a news group to 
cull data for an access database.  Each file was typically 300 to 800 K in 
size.  I started by reading entire files into a string variable and then 
running the string manipulation code to parse but quickly determined that if 
I split the files into halves, each half took less than half the time to 
parse.

I then split the file by halves, and halved the halves in a binary splitting 
procedure breaking on a space until the files were down to about 5 K.  In 
that case, the time taken for splitting the string into quickly digested 
chunks was adding more time to the procedure than it saved.  A bit of 
testing ultimately proved that a file size of around 7 to 8 K resulted in 
the best overall performance with the typical kind of data that needed to be 
processed.

If I were to do it over again, I might be tempted to read in chunks at a 
time, finding the last delimiter and saving from there on in a temp variable 
to prepend to the next chunk to be read.  I'm not certain that a line at a 
time will give the best performance but I do know from experience that 
excessively large strings take exponentially longer to process.  The time 
for parsing a typical single file went from something like 5 to 10 minutes 
to under 30 seconds by splitting into smaller chunks before processing.

Although VBA string variables can hold a a billion characters, that would 
probably take insanely long to process as a single string.  If your text 
files are only a few hundred characters, I'd try timing the token operation 
on the full files.

Ciao
Jürgen Welz
Edmonton, Alberta
jwelz at hotmail.com

>From: "Christopher Hawkins" <clh at christopherhawkins.com>
>Reply-To: Access Developers discussion and problem 
>solving<accessd at databaseadvisors.com>
>To: accessd at databaseadvisors.com
>Subject: [AccessD] Replacing multiple tokens in a text file?
>Date: Mon, 31 May 2004 12:19:57 -0600
>MIME-Version: 1.0
>Received: from mc11-f26.hotmail.com ([65.54.167.33]) by mc11-s5.hotmail.com 
>with Microsoft SMTPSVC(5.0.2195.6824); Mon, 31 May 2004 11:21:49 -0700
>Received: from databaseadvisors.com ([209.135.140.44]) by 
>mc11-f26.hotmail.com with Microsoft SMTPSVC(5.0.2195.6824); Mon, 31 May 
>2004 11:20:39 -0700
>Received: from databaseadvisors.com (databaseadvisors.com 
>[209.135.140.44])by databaseadvisors.com (8.11.6/8.11.6) with ESMTP id 
>i4VIJCQ01892;Mon, 31 May 2004 13:19:12 -0500
>Received: from mail-relay.gearhost.com (ns2.co.gearhost.net 
>[69.24.64.15])by databaseadvisors.com (8.11.6/8.11.6) with ESMTP id 
>i4VIIrQ01669for <accessd at databaseadvisors.com>; Mon, 31 May 2004 13:18:53 
>-0500
>Received: from mail.gearhost.net ([69.24.64.25])by mail-relay.gearhost.com 
>(mail-relay.gearhost.com)(MDaemon.PRO.v7.1.0.R) with ESMTP id 
>md50000450849.msgfor <accessd at databaseadvisors.com>; Mon, 31 May 2004 
>12:20:01 -0600
>Received: from christopherhawkins.com (unverified [127.0.0.1]) 
>bymail.gearhost.net(Rockliffe SMTPRA 4.5.6) with ESMTP id 
><B0018590842 at mail.gearhost.net>for <accessd at databaseadvisors.com>; Mon, 31 
>May 2004 12:19:57 -0600
>X-Message-Info: 1fLmhUU0vWFvdH+J6tlz6F85W0zaUsn6uS5jh9M9uj4=
>Message-ID: <157240-220045131181957850 at christopherhawkins.com>
>X-EM-Version: 6, 0, 0, 6
>X-EM-Registration: #00E0620610781F002A20
>X-Spam-Processed: mail-relay.gearhost.com, Mon, 31 May 2004 12:20:01 
>-0600(not processed: spam filter disabled)
>X-MDRemoteIP: 69.24.64.25
>X-Return-Path: clh at christopherhawkins.com
>X-MDaemon-Deliver-To: accessd at databaseadvisors.com
>X-BeenThere: accessd at databaseadvisors.com
>X-Mailman-Version: 2.1.4
>Precedence: list
>List-Id: Access Developers discussion and problem 
>solving<accessd.databaseadvisors.com>
>List-Help: <mailto:accessd-request at databaseadvisors.com?subject=help>
>List-Post: <mailto:accessd at databaseadvisors.com>
>List-Subscribe: 
><http://databaseadvisors.com/mailman/listinfo/accessd>,<mailto:accessd-request at databaseadvisors.com?subject=subscribe>
>List-Archive: <http://databaseadvisors.com/pipermail/accessd>
>List-Unsubscribe: 
><http://databaseadvisors.com/mailman/listinfo/accessd>,<mailto:accessd-request at databaseadvisors.com?subject=unsubscribe>
>Errors-To: accessd-bounces at databaseadvisors.com
>Return-Path: accessd-bounces at databaseadvisors.com
>X-OriginalArrivalTime: 31 May 2004 18:20:39.0809 (UTC) 
>FILETIME=[FA1D6710:01C4473B]
>
>I suppose the title says it all.  ;)
>
>Given a text file with numerous (all different) tokens in it, how
>would I replace them?  I mean, I know how to replace tokens in a
>string, but I've never had to use a text file for a string before.
>
>Should I stuff the entire text file into an array and wash it through
>my detokenizing code until no more tokens are found?
>
>Should I read the file line-by-line, replacing tokens as I go?
>
>I'm just not sure how to get started here.
>
>-Christopher-
>
>
>--
>_______________________________________________
>AccessD mailing list
>AccessD at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/accessd
>Website: http://www.databaseadvisors.com

_________________________________________________________________
MSN Premium with Virus Guard and Firewall* from McAfee® Security : 2 months 
FREE*   
http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=http://hotmail.com/enca&HL=Market_MSNIS_Taglines