MartyConnelly
martyconnelly at shaw.ca
Sun Jun 13 14:12:55 CDT 2004
I was reading an XML article on encoding where the author stated he couldn't get this to work http://www.topxml.com/code/default.asp?p=3&id=v20010810181946 It might be useful to someone. I didn't know you could do this with a stream and took a guess at how it was handling binaries. There are other ways to do this but this is a short method. Essentially the code below takes a Text or XML file and changes the Encoding from UTF-8 to UTF-16 (Unicode) It uses the ADODB stream object and charset property. I haven't seen this written up anywhere. The trick is to read and rewrite the ADODB stream. Loading and saving the file results by itself in a double BOM and garbage. I am guessing but you may be able to go back and forth between character set encodings. assuming you are not doing something ridiculous like converting Thai unicode to ASCII. This would include Chinese Big5, JIS and various ISO encodings. See input file samples of characters in a variety of about 20 languages in two encodings. Just for Martin there is even Irish Gaelic, of course Scot's Gaelic is known as "The Gaelic" http://www5.brinkster.com/mconnelly/xmltest/testUTF-8.xml http://www5.brinkster.com/mconnelly/xmltest/testUTF-16.xm To play around you will need the files with proper BOM markers. http://www5.brinkster.com/mconnelly/xmltest/testUTF-16.zip Const TopLine = "" 'or if using xml files encoding to switch processing instruction Const TopLine = "<?xml version=""1.0"" encoding=""utf-16"" ?>" Sub ReadUTF8SaveFileInUTF16() Dim stm As ADODB.stream 'ADO 2.7 Dim strData As String Set stm = New ADODB.stream stm.Open stm.Charset = "UTF-8" stm.Position = 0 stm.Type = adTypeText stm.LoadFromFile "XM8_UTF_vb.xml" stm.Position = 0 strData = stm.ReadText() ' line below can be removed for straight text files rather than xml. strData = TopLine & Right$(strData, Len(strData) - Len(TopLine)) stm.Position = 0 ' set output file character set to ' "Unicode" '"iso-8859-1" "ascii" '"Big5" '"hebrew" 'The character set names for the machine are in the registry 'For a list of the character set strings that is known by a system, see 'the subkeys of HKEY_CLASSES_ROOT\MIME\Database\Charset 'in the Windows Registry. stm.Charset = "Unicode" stm.WriteText (strData) stm.SaveToFile "test16.xml", adSaveCreateOverWrite stm.Close Set stm = Nothing End Sub -- Marty Connelly Victoria, B.C. Canada