Erwin Craps - IT Helps
Erwin.Craps at ithelps.be
Thu Mar 31 03:41:11 CST 2005
Hummpff... It's incredible, I sometimes still do receive mailings (paper) where accented character are replaced by other... New technology same problems, over and over and over again since I'm in IT (since MS-DOS 1.1). I suggest we finaly get rid of these reoccuring language problems and give a free course for every earthling so we would all speak and write the same language. Ofcourse aliens whom would consider livig on earth are obligated to follow the free course too. :-) Erwin -----Original Message----- From: dba-tech-bounces at databaseadvisors.com [mailto:dba-tech-bounces at databaseadvisors.com] On Behalf Of MartyConnelly Sent: Thursday, March 31, 2005 1:40 AM To: Discussion of Hardware and Software issues Subject: Re: [dba-Tech] Something I just learned Just a word of warning about some of this, you will run into it at some point in time since Unicode files can be UTF-8 or UTF-16. There are the Windows typographic characters known by their HTML4 character entity names, such as —, ‘, ™ and so on ( and " emdash etc.). These have in fact been around for a while, and are understood even by a number of older browsers that do not support utf-8 and would not be able to understand the corresponding unicode &#bignumber; representations. These have been around from before the Unicode standard was set. Now if you consider the Western European "MS-Windows" character set, windows-1252. This is a special cause of confusion: all of the displayable character code values of iso-8859-1 coincide with the same codes in this Windows code - but additionally, the Windows code assigns displayable characters in the area which the iso-8859-n codes reserved for control functions. In unicode, those characters have code values above 256. I am scratching my head about this because these windows typographical characters ANSI 128 -159 as control characters are considered illegal characters in XML for example " decimal 153 hex 99 and should be unicode escaped character "™" but some UTF-8 conversion programs don't do this conversion. properly so it screws up your xml parsers with illegal characters. I am almost tempted to do everything in UTF-16. The windows control characters that cause the problem run from ANSI decimal 128-159. If that isn't enough some little darlings changed the ISO-8859-1 spec to handle the Euro character and you now have to look at Latin-9 or ISO-8859-15 I still haven't groked all this yet. I still have to hunt through xml files with a hexeditor to see what is going on.. Steve Erbach wrote: >I had been wondering how to insert Unicode characters into a document. >There's a nifty web site ( >http://www.visibone.com/htmlref/char/cer.htm ) that shows the HTML >numeric codes for the entire Unicode set. I then went into Microsoft >Word 2003 and found that if you know the hexadecimal number for a >Unicode character (265B, for example) then all you have to do is type >that number and press Alt-X, and the number will be converted to the >Unicode character, in this case, a Black chess Queen. > >There's also the entire Unicode set in Word under Insert | Symbol. The >Symbols tab has a Font list. I picked the Arial Unicode MS font. There >is another drop down list with "subsets" of the Unicode list. So you >could jump to Miscellaneous Dingbats and locate the Black Chess Queen >that way. > >The Alt-X shortcut works in Word, WordPad, and Windows Messenger, but >not in Access. > > > -- Marty Connelly Victoria, B.C. Canada _______________________________________________ dba-Tech mailing list dba-Tech at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/dba-tech Website: http://www.databaseadvisors.com