MartyConnelly
martyconnelly at shaw.ca
Wed Mar 30 17:40:22 CST 2005
Just a word of warning about some of this, you will run into it at some point in time since Unicode files can be UTF-8 or UTF-16. There are the Windows typographic characters known by their HTML4 character entity names, such as —, ‘, ™ and so on ( and " emdash etc.). These have in fact been around for a while, and are understood even by a number of older browsers that do not support utf-8 and would not be able to understand the corresponding unicode &#bignumber; representations. These have been around from before the Unicode standard was set. Now if you consider the Western European "MS-Windows" character set, windows-1252. This is a special cause of confusion: all of the displayable character code values of iso-8859-1 coincide with the same codes in this Windows code - but additionally, the Windows code assigns displayable characters in the area which the iso-8859-n codes reserved for control functions. In unicode, those characters have code values above 256. I am scratching my head about this because these windows typographical characters ANSI 128 -159 as control characters are considered illegal characters in XML for example " decimal 153 hex 99 and should be unicode escaped character "™" but some UTF-8 conversion programs don't do this conversion. properly so it screws up your xml parsers with illegal characters. I am almost tempted to do everything in UTF-16. The windows control characters that cause the problem run from ANSI decimal 128-159. If that isn't enough some little darlings changed the ISO-8859-1 spec to handle the Euro character and you now have to look at Latin-9 or ISO-8859-15 I still haven't groked all this yet. I still have to hunt through xml files with a hexeditor to see what is going on.. Steve Erbach wrote: >I had been wondering how to insert Unicode characters into a document. >There's a nifty web site ( >http://www.visibone.com/htmlref/char/cer.htm ) that shows the HTML >numeric codes for the entire Unicode set. I then went into Microsoft >Word 2003 and found that if you know the hexadecimal number for a >Unicode character (265B, for example) then all you have to do is type >that number and press Alt-X, and the number will be converted to the >Unicode character, in this case, a Black chess Queen. > >There's also the entire Unicode set in Word under Insert | Symbol. The >Symbols tab has a Font list. I picked the Arial Unicode MS font. There >is another drop down list with "subsets" of the Unicode list. So you >could jump to Miscellaneous Dingbats and locate the Black Chess Queen >that way. > >The Alt-X shortcut works in Word, WordPad, and Windows Messenger, but >not in Access. > > > -- Marty Connelly Victoria, B.C. Canada