[AccessD] Replacing multiple tokens in a text file?

DWUTKA at marlow.com DWUTKA at marlow.com
Tue Jun 1 11:27:09 CDT 2004


Just an FYI, I've done a lot of 'playing around' with text files, and one of
the issues I have noted is that it's not the size of the text file, or the
string, that is the real speed issue.  It's the joining of a string, or
parsing.

For example, I recently had a little project where the users wanted to parse
out to a pipe delimited text file, various fields from a table.  I whipped
up a little VB code, and kicked it off.  I ran it on the entire system,
which had abou 65k records.  It started doing about a thousand a second, and
then began to slow down.  It kept slowing down, down to 10 records a second
(or so) around 30k.  The speed issue was this:

strTemp=strTemp & rs.Fields(i).value

What is causing the delay, is that you are taking XXXXXXXXXXXXXXXXXXXXXX and
setting it to XXXXXXXXXXXXXXXX+YYYY.  As the string gets longer and longer,
it is forcing the CPU to work overtime to process that line.  It isn't being
smart, and literally adding the one string to the end, it is literally
'rebuilding' the string over and over.  As soon as I changed the code to
write each line to file, instead of adding it to another string, it boosted
up to about 2k a second, and stayed there through all 65k records.

In the case of token replacement, I would think that Replace would would
fine on the entire string, because I'm willing to bet that it is a faster
string manipulation process then the roll your own types.  I would think...

Drew

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com]On Behalf Of Jürgen Welz
Sent: Monday, May 31, 2004 3:54 PM
To: accessd at databaseadvisors.com
Subject: RE: [AccessD] Replacing multiple tokens in a text file?


Line by line is ok if you're certain no tokens are split between lines.

I once had to parse several hundred email log files from a news group to 
cull data for an access database.  Each file was typically 300 to 800 K in 
size.  I started by reading entire files into a string variable and then 
running the string manipulation code to parse but quickly determined that if

I split the files into halves, each half took less than half the time to 
parse.

I then split the file by halves, and halved the halves in a binary splitting

procedure breaking on a space until the files were down to about 5 K.  In 
that case, the time taken for splitting the string into quickly digested 
chunks was adding more time to the procedure than it saved.  A bit of 
testing ultimately proved that a file size of around 7 to 8 K resulted in 
the best overall performance with the typical kind of data that needed to be

processed.

If I were to do it over again, I might be tempted to read in chunks at a 
time, finding the last delimiter and saving from there on in a temp variable

to prepend to the next chunk to be read.  I'm not certain that a line at a 
time will give the best performance but I do know from experience that 
excessively large strings take exponentially longer to process.  The time 
for parsing a typical single file went from something like 5 to 10 minutes 
to under 30 seconds by splitting into smaller chunks before processing.

Although VBA string variables can hold a a billion characters, that would 
probably take insanely long to process as a single string.  If your text 
files are only a few hundred characters, I'd try timing the token operation 
on the full files.


Ciao
Jürgen Welz
Edmonton, Alberta
jwelz at hotmail.com





>From: "Christopher Hawkins" <clh at christopherhawkins.com>
>Reply-To: Access Developers discussion and problem 
>solving<accessd at databaseadvisors.com>
>To: accessd at databaseadvisors.com
>Subject: [AccessD] Replacing multiple tokens in a text file?
>Date: Mon, 31 May 2004 12:19:57 -0600
>MIME-Version: 1.0
>Received: from mc11-f26.hotmail.com ([65.54.167.33]) by mc11-s5.hotmail.com

>with Microsoft SMTPSVC(5.0.2195.6824); Mon, 31 May 2004 11:21:49 -0700
>Received: from databaseadvisors.com ([209.135.140.44]) by 
>mc11-f26.hotmail.com with Microsoft SMTPSVC(5.0.2195.6824); Mon, 31 May 
>2004 11:20:39 -0700
>Received: from databaseadvisors.com (databaseadvisors.com 
>[209.135.140.44])by databaseadvisors.com (8.11.6/8.11.6) with ESMTP id 
>i4VIJCQ01892;Mon, 31 May 2004 13:19:12 -0500
>Received: from mail-relay.gearhost.com (ns2.co.gearhost.net 
>[69.24.64.15])by databaseadvisors.com (8.11.6/8.11.6) with ESMTP id 
>i4VIIrQ01669for <accessd at databaseadvisors.com>; Mon, 31 May 2004 13:18:53 
>-0500
>Received: from mail.gearhost.net ([69.24.64.25])by mail-relay.gearhost.com 
>(mail-relay.gearhost.com)(MDaemon.PRO.v7.1.0.R) with ESMTP id 
>md50000450849.msgfor <accessd at databaseadvisors.com>; Mon, 31 May 2004 
>12:20:01 -0600
>Received: from christopherhawkins.com (unverified [127.0.0.1]) 
>bymail.gearhost.net(Rockliffe SMTPRA 4.5.6) with ESMTP id 
><B0018590842 at mail.gearhost.net>for <accessd at databaseadvisors.com>; Mon, 31 
>May 2004 12:19:57 -0600
>X-Message-Info: 1fLmhUU0vWFvdH+J6tlz6F85W0zaUsn6uS5jh9M9uj4=
>Message-ID: <157240-220045131181957850 at christopherhawkins.com>
>X-EM-Version: 6, 0, 0, 6
>X-EM-Registration: #00E0620610781F002A20
>X-Spam-Processed: mail-relay.gearhost.com, Mon, 31 May 2004 12:20:01 
>-0600(not processed: spam filter disabled)
>X-MDRemoteIP: 69.24.64.25
>X-Return-Path: clh at christopherhawkins.com
>X-MDaemon-Deliver-To: accessd at databaseadvisors.com
>X-BeenThere: accessd at databaseadvisors.com
>X-Mailman-Version: 2.1.4
>Precedence: list
>List-Id: Access Developers discussion and problem 
>solving<accessd.databaseadvisors.com>
>List-Help: <mailto:accessd-request at databaseadvisors.com?subject=help>
>List-Post: <mailto:accessd at databaseadvisors.com>
>List-Subscribe: 
><http://databaseadvisors.com/mailman/listinfo/accessd>,<mailto:accessd-requ
est at databaseadvisors.com?subject=subscribe>
>List-Archive: <http://databaseadvisors.com/pipermail/accessd>
>List-Unsubscribe: 
><http://databaseadvisors.com/mailman/listinfo/accessd>,<mailto:accessd-requ
est at databaseadvisors.com?subject=unsubscribe>
>Errors-To: accessd-bounces at databaseadvisors.com
>Return-Path: accessd-bounces at databaseadvisors.com
>X-OriginalArrivalTime: 31 May 2004 18:20:39.0809 (UTC) 
>FILETIME=[FA1D6710:01C4473B]
>
>I suppose the title says it all.  ;)
>
>Given a text file with numerous (all different) tokens in it, how
>would I replace them?  I mean, I know how to replace tokens in a
>string, but I've never had to use a text file for a string before.
>
>Should I stuff the entire text file into an array and wash it through
>my detokenizing code until no more tokens are found?
>
>Should I read the file line-by-line, replacing tokens as I go?
>
>I'm just not sure how to get started here.
>
>-Christopher-
>
>
>--
>_______________________________________________
>AccessD mailing list
>AccessD at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/accessd
>Website: http://www.databaseadvisors.com

_________________________________________________________________
MSN Premium with Virus Guard and Firewall* from McAfee® Security : 2 months 
FREE*   
http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=htt
p://hotmail.com/enca&HL=Market_MSNIS_Taglines

-- 
_______________________________________________
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com



More information about the AccessD mailing list