[dba-Tech] Something I just learned

Erwin Craps - IT Helps Erwin.Craps at ithelps.be
Thu Mar 31 03:41:11 CST 2005


Hummpff...

It's incredible, I sometimes still do receive mailings (paper) where
accented character are replaced by other...

New technology same problems, over and over and over again since I'm in
IT (since MS-DOS 1.1).
I suggest we finaly get rid of these reoccuring language problems and
give a free course for every earthling so we would all speak and write
the same language.

Ofcourse aliens whom would consider livig on earth are obligated to
follow the free course too.

:-)

Erwin
 

-----Original Message-----
From: dba-tech-bounces at databaseadvisors.com
[mailto:dba-tech-bounces at databaseadvisors.com] On Behalf Of
MartyConnelly
Sent: Thursday, March 31, 2005 1:40 AM
To: Discussion of Hardware and Software issues
Subject: Re: [dba-Tech] Something I just learned

Just a word of warning about some of this, you will run into it at some
point in time since Unicode files can be UTF-8 or UTF-16.
There are the Windows typographic characters known by their HTML4
character entity names, such as —, ‘, ™ and so on ( 
and " emdash etc.). These have in fact been around for a while, and are
understood even by a number of older browsers that do not support utf-8
and would not be able to understand the corresponding unicode
&#bignumber; representations. These have been around from before the
Unicode standard was set.

Now if you consider the Western European "MS-Windows" character set,
windows-1252. This is a special cause of confusion: all of the
displayable character code values of iso-8859-1 coincide with the same
codes in this Windows code - but additionally, the Windows code assigns
displayable characters in the area which the iso-8859-n codes reserved
for control functions. In unicode, those characters have code values
above 256.

I am scratching my head about this because these windows typographical
characters ANSI 128 -159 as control characters are considered illegal
characters in XML for example " decimal 153 hex 99 and should be unicode
escaped character "™" but some UTF-8 conversion programs don't do
this conversion.
properly so it screws up your xml parsers with illegal characters. I am
almost tempted to do everything in UTF-16.

The windows control characters that cause the problem run from ANSI
decimal 128-159.

If that isn't enough some little darlings changed the ISO-8859-1 spec to
handle the Euro character and you now have to look at Latin-9 or
ISO-8859-15

I still haven't groked all this yet. I still have to hunt through xml
files with a hexeditor to see what is going on..

Steve Erbach wrote:

>I had been wondering how to insert Unicode characters into a document.
>There's a nifty web site (
>http://www.visibone.com/htmlref/char/cer.htm ) that shows the HTML 
>numeric codes for the entire Unicode set. I then went into Microsoft 
>Word 2003 and found that if you know the hexadecimal number for a 
>Unicode character (265B, for example) then all you have to do is type 
>that number and press Alt-X, and the number will be converted to the 
>Unicode character, in this case, a Black chess Queen.
>
>There's also the entire Unicode set in Word under Insert | Symbol. The 
>Symbols tab has a Font list. I picked the Arial Unicode MS font. There 
>is another drop down list with "subsets" of the Unicode list. So you 
>could jump to Miscellaneous Dingbats and locate the Black Chess Queen 
>that way.
>
>The Alt-X shortcut works in Word, WordPad, and Windows Messenger, but 
>not in Access.
>
>  
>

--
Marty Connelly
Victoria, B.C.
Canada



_______________________________________________
dba-Tech mailing list
dba-Tech at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/dba-tech
Website: http://www.databaseadvisors.com



More information about the dba-Tech mailing list