[AccessD] Delimiter Value

Arthur Fuller fuller.artful at gmail.com
Tue Mar 3 13:32:14 CST 2015


Gustav.

Mind you, I have only two computers and attendant keyboards handy, but
neither keyboard has "¤" available. In fact, prior to your email, I don't
think that I have ever seen this character -- which may well qualify it as
an excellent delimiter.

JWC:

You're quite right, in terms of your specific app, that a double-character
delimiter would impose significant overhead. In my particular case, a
code-generator, this was never a problem, given that my average template
was <= 1k of text. So back to Gustav's suggestion. Maybe that "¤" thing is
a great candidate for a delimiter.

A,

On Tue, Mar 3, 2015 at 12:06 PM, John W. Colby <jwcolby at gmail.com> wrote:

> Arthur,
>
> To be quite honest the concept of double character delimiters just never
> occurred to me, however a couple of things come to mind.
>
> 1) I do not write the code that actually writes the delimiters into the
> file or strips it back out.  I simply use an existing "widget" of some
> kind.  For example SQL Server has BCP that has built into it, buried way
> down inside, the ability to do this kind of thing.
>
> So how do I know that attempting to use a double character will even work
> in terms of getting the thing into the file.  Likewise, how do I know that
> the "other end" will understand how to use a double character to get it
> back out.
>
> CSV is a "standard.  For better or worse it was extensively jawboned
> amongst some number of people and then a standard written.  I have never
> actually seen a double character used as the delimiter in any file handed
> to me.
>
> 2) All to often we are using this to send tons of data in large files to
> someone.  Even the overhead of "," adds a lot to the size of the file.
> Take a file of 20 fields times a million records and look at how much
> "extra" baggage (overhead) has to be handled.  You are talking roughly 40
> million extra characters in the file, just to handle a "once in a blue moon
> issue".
>
> I hand off files of anywhere from 500K to 2 million records to a third
> party program.  I had to jump through hoops to get them to handle the pipe
> character instead of the CSV standard.  Before I asked them to switch to
> the pipe, I routinely ran into " issues (in nicknames inserted into first
> name fields).  Since the switch to pipes (many years ago), I have never run
> into a delimiter issue since.
>
> John W. Colby
>
> On 3/3/2015 11:47 AM, Arthur Fuller wrote:
>
>> This is an interesting thread but I do have one question, which derives
>> from personal experience writing a code generator that had to find markers
>> in templates and substitute either literal or iterated text in place of
>> said markers.
>>
>> I opted for two-character delimiters ; they could have been pretty much
>> any
>> given character expressed twice. As it happens, I opted for "\\marker\\",
>> which admittedly could have created problems when attempting to express
>> UNCs etc., but I escaped these in the standard way.
>>
>> As previously pointed out, the use of the cent-sign may not be universal,
>> and that is a problem.
>>
>> But my larger question is this: why are you restricting delimiters to
>> single-character markers? That seems to me to be the path to trouble no
>> matter which native (human or typographical) language you are using.
>> Ultimately, this problem resolves to the likelihood of any given marker
>> being used within the text of interest; and so we have to go for the
>> unlikeliest combinations, the smallest unit being one character, but I am
>> arguing in favor of two-character delimiters, while also providing for the
>> standard escape-clause syntax of repeating the delimiter twice if you mean
>> it not as a delimiter but part of the text.
>>
>> I've just Googled "the number of official languages in the world", and
>> India and China come out way on top, with a few hundred in each nation.
>> Just to make this much more painfully clear, "language" is distinguished
>> from "dialect", which is to say that even though I have lived almost my
>> life in English-speaking Canada, I can also recognize Cicero-English from
>> New Orleans English from Oxford English, and also readily discern Quebec
>> francais from Parisian and even Marseilles. I'm shakier on Dutch, but I
>> can
>> discern Mandarin from Cantonese in fewer than 5 seconds, and also
>> Shanghaiese from both of these. You might conjecture that I have an ear
>> for
>> language; perhaps that is true. After all, I can convincingly pronounce
>> two
>> of the  trickiest words in Dutch, but that's because I've visited
>> Nederlands about six times, and picked up a little more on each visit. The
>> two most difficult words in Dutch are the word for vacuum cleaner
>> (stofzuiger)
>> and the name of a town on the coast whose name was used in WWII to
>> distinguish Nazi spys from legitimate Dutch citizens (Schlevningen). It
>> required more practice for me to get these two words than to apprehend and
>> duplicate the tones in Cantonese; but I got there, eventually.
>>
>> Sorry for the extensive sidetrack. I just wanted to emphasize that
>> single-character delimiters are bound to cause trouble in a large number
>> of
>> languages in the world. Facing this ugly fact, the developer has a couple
>> of choices: narrow the translation-geography to a few well-chosen
>> languages
>> of immediate interest to the app of interest, or strive for a broader
>> translation-strategy; in my opinion the only viable path to the latter
>> approach is to expand the concept of a delimiter beyond single-character
>> representation, and in addition to provide the standard escape-syntax
>> (e.g.
>> in py preferred choice, "\\" is the delimiter, except when repeated, in
>> which case the second occurrence is to be interpreted as literal text).
>>
>> This syntax and notation, I hope, will sidestep the fact that the "cents"
>> sign will not be misinterpreted by non-English (and in fact
>> non-Western-European) keyboards.
>>
>> Arthur
>>
>> On Tue, Mar 3, 2015 at 11:06 AM, Gustav Brock <gustav at cactus.dk> wrote:
>>
>>  Hi Tina
>>>
>>> That character could be: ¤
>>> I've seen on every keyboard but never seen it used for anything.
>>>
>>> However, I've never had issues with the "|" pipe sign, so why make things
>>> more strange than necessary.
>>>
>>> /gustav
>>>
>>> -----Oprindelig meddelelse-----
>>> Fra: accessd-bounces at databaseadvisors.com [mailto:
>>> accessd-bounces at databaseadvisors.com] På vegne af Tina Norris Fields
>>> Sendt: 3. marts 2015 17:02
>>> Til: Access Developers discussion and problem solving
>>> Emne: Re: [AccessD] Delimiter Value (was: Automatic Update Function)
>>>
>>> This is an intriguing part of the discussion.  One perspective is to find
>>> a symbol that will not accidentally be typed because it's not on a
>>> standard
>>> keyboard.  The other perspective is to find a symbol that can easily be
>>> used (because it is on a standard keyboard), but is not commonly used for
>>> most typing and coding.  Fascinating.  I like the pipe symbol because it
>>> fits with the second perspective.  I would like an
>>> ALT+ASCII code symbol that is easy to remember and doesn't appear on a
>>> standard keyboard, too, because it reduces the threat of accidental
>>> typing.  Hmmm - pondering.
>>> TNF
>>>
>>>
> --
> AccessD mailing list
> AccessD at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/accessd
> Website: http://www.databaseadvisors.com
>



-- 
Arthur


More information about the AccessD mailing list