[AccessD] Delimiter Value

John W. Colby jwcolby at gmail.com
Tue Mar 3 11:06:16 CST 2015


Arthur,

To be quite honest the concept of double character delimiters just never 
occurred to me, however a couple of things come to mind.

1) I do not write the code that actually writes the delimiters into the 
file or strips it back out.  I simply use an existing "widget" of some 
kind.  For example SQL Server has BCP that has built into it, buried way 
down inside, the ability to do this kind of thing.

So how do I know that attempting to use a double character will even 
work in terms of getting the thing into the file.  Likewise, how do I 
know that the "other end" will understand how to use a double character 
to get it back out.

CSV is a "standard.  For better or worse it was extensively jawboned 
amongst some number of people and then a standard written.  I have never 
actually seen a double character used as the delimiter in any file 
handed to me.

2) All to often we are using this to send tons of data in large files to 
someone.  Even the overhead of "," adds a lot to the size of the file.  
Take a file of 20 fields times a million records and look at how much 
"extra" baggage (overhead) has to be handled.  You are talking roughly 
40 million extra characters in the file, just to handle a "once in a 
blue moon issue".

I hand off files of anywhere from 500K to 2 million records to a third 
party program.  I had to jump through hoops to get them to handle the 
pipe character instead of the CSV standard.  Before I asked them to 
switch to the pipe, I routinely ran into " issues (in nicknames inserted 
into first name fields).  Since the switch to pipes (many years ago), I 
have never run into a delimiter issue since.

John W. Colby

On 3/3/2015 11:47 AM, Arthur Fuller wrote:
> This is an interesting thread but I do have one question, which derives
> from personal experience writing a code generator that had to find markers
> in templates and substitute either literal or iterated text in place of
> said markers.
>
> I opted for two-character delimiters ; they could have been pretty much any
> given character expressed twice. As it happens, I opted for "\\marker\\",
> which admittedly could have created problems when attempting to express
> UNCs etc., but I escaped these in the standard way.
>
> As previously pointed out, the use of the cent-sign may not be universal,
> and that is a problem.
>
> But my larger question is this: why are you restricting delimiters to
> single-character markers? That seems to me to be the path to trouble no
> matter which native (human or typographical) language you are using.
> Ultimately, this problem resolves to the likelihood of any given marker
> being used within the text of interest; and so we have to go for the
> unlikeliest combinations, the smallest unit being one character, but I am
> arguing in favor of two-character delimiters, while also providing for the
> standard escape-clause syntax of repeating the delimiter twice if you mean
> it not as a delimiter but part of the text.
>
> I've just Googled "the number of official languages in the world", and
> India and China come out way on top, with a few hundred in each nation.
> Just to make this much more painfully clear, "language" is distinguished
> from "dialect", which is to say that even though I have lived almost my
> life in English-speaking Canada, I can also recognize Cicero-English from
> New Orleans English from Oxford English, and also readily discern Quebec
> francais from Parisian and even Marseilles. I'm shakier on Dutch, but I can
> discern Mandarin from Cantonese in fewer than 5 seconds, and also
> Shanghaiese from both of these. You might conjecture that I have an ear for
> language; perhaps that is true. After all, I can convincingly pronounce two
> of the  trickiest words in Dutch, but that's because I've visited
> Nederlands about six times, and picked up a little more on each visit. The
> two most difficult words in Dutch are the word for vacuum cleaner (stofzuiger)
> and the name of a town on the coast whose name was used in WWII to
> distinguish Nazi spys from legitimate Dutch citizens (Schlevningen). It
> required more practice for me to get these two words than to apprehend and
> duplicate the tones in Cantonese; but I got there, eventually.
>
> Sorry for the extensive sidetrack. I just wanted to emphasize that
> single-character delimiters are bound to cause trouble in a large number of
> languages in the world. Facing this ugly fact, the developer has a couple
> of choices: narrow the translation-geography to a few well-chosen languages
> of immediate interest to the app of interest, or strive for a broader
> translation-strategy; in my opinion the only viable path to the latter
> approach is to expand the concept of a delimiter beyond single-character
> representation, and in addition to provide the standard escape-syntax (e.g.
> in py preferred choice, "\\" is the delimiter, except when repeated, in
> which case the second occurrence is to be interpreted as literal text).
>
> This syntax and notation, I hope, will sidestep the fact that the "cents"
> sign will not be misinterpreted by non-English (and in fact
> non-Western-European) keyboards.
>
> Arthur
>
> On Tue, Mar 3, 2015 at 11:06 AM, Gustav Brock <gustav at cactus.dk> wrote:
>
>> Hi Tina
>>
>> That character could be: ¤
>> I've seen on every keyboard but never seen it used for anything.
>>
>> However, I've never had issues with the "|" pipe sign, so why make things
>> more strange than necessary.
>>
>> /gustav
>>
>> -----Oprindelig meddelelse-----
>> Fra: accessd-bounces at databaseadvisors.com [mailto:
>> accessd-bounces at databaseadvisors.com] På vegne af Tina Norris Fields
>> Sendt: 3. marts 2015 17:02
>> Til: Access Developers discussion and problem solving
>> Emne: Re: [AccessD] Delimiter Value (was: Automatic Update Function)
>>
>> This is an intriguing part of the discussion.  One perspective is to find
>> a symbol that will not accidentally be typed because it's not on a standard
>> keyboard.  The other perspective is to find a symbol that can easily be
>> used (because it is on a standard keyboard), but is not commonly used for
>> most typing and coding.  Fascinating.  I like the pipe symbol because it
>> fits with the second perspective.  I would like an
>> ALT+ASCII code symbol that is easy to remember and doesn't appear on a
>> standard keyboard, too, because it reduces the threat of accidental
>> typing.  Hmmm - pondering.
>> TNF
>>



More information about the AccessD mailing list