MartyConnelly
martyconnelly at shaw.ca
Fri Mar 3 13:59:45 CST 2006
Just some thoughts on this
The xml PI ( processing instruction) statement. <?xml version="1.0"
encoding="iso-8859-1"?>
is not required if the xml file is UTF-8 or UTF-16, ie. file starts
with a proper BOM
at least for MS XML parsers. I wish MS would spell out, all of it's XML defaults,
They assume developers know by osmosis.
Watch out for encoding statements, it assumes you know where the file is
coming from, if it's not specified then,
the parser assumes UTF-8 and or will do a BOM check. If you add an
encoding as above you are stating the file is
originally created as ANSI "iso-8859-1" western european.
Any characters outside this range will be non-valid or may not be
interpeted correctly. It used to do funny things to Euro and UK pound
symbols.
You can however change the PI on the fly with statements like
pi = xmlDoc.createProcessingInstruction("xml", "version=\"1.0\"");
xmlDoc.insertBefore(pi, xmlDoc.childNodes.item(0));
If you edit an xml file in notepad watch out whether you save as ANSI or
UTF-8 (unicode). May cause grief by changing the BOM
In most cases US users will get away with this type of encoding
iso-8859-1, but if you start bringing in files from international sites
or Unix boxes
this will give you problems.
See info on xml encodings.
http://www.topxml.com/code/default.asp?p=3&id=v20010810181946
There are quick and dirty ways to bulk change encodings via ADO stream
charset's, I posted some code in the archives.
There is a difference between well-formed and valid XML. Well-formed is
a syntax check on XML (ie. matching tags)
Valid also means that XML data entities and attributes comply with an
xml schema or DTD.
Here is some validation code that might help you out, the special error
code displays the xml character in error
I hated trying to count the error line and position number in a file to
determine the character in error.
There is also a routine to check the files BOM marker.
'ValidXMLCheck "C:\XML\Gil Encodings\encUTF8_noBOM.xml"
Sub ValidXMLCheck(strxmlfilepath As String)
Dim xmlMessage As MSXML2.DOMDocument40
Dim oXMLError As IXMLDOMParseError
Dim lngErrCode As Long
Set xmlMessage = New MSXML2.DOMDocument40
xmlMessage.async = False
xmlMessage.validateOnParse = True 'true by default
xmlMessage.resolveExternals = False
'Set xmlMessage.schemas = xmlSchema
'After loading the XML document, call the Validate method of the
'DOMDocument. If there is an error validating against the schema, there
will be a
'parse Error:
xmlMessage.Load (strxmlfilepath)
lngErrCode = xmlMessage.validate()
If xmlMessage.parseError.errorCode <> 0 Then
Debug.Print " Reason: " & xmlMessage.parseError.reason
Set oXMLError = xmlMessage.parseError
reportParseError oXMLError
Else
Debug.Print strxmlfilepath & " file OK"
End If
End Sub
Public Function reportParseError(err As IXMLDOMParseError)
'this is not setup to count tabs used as whitespace
Dim s As String
Dim r As String
Dim i As Long
s = ""
For i = 1 To err.linepos - 1
s = s & " "
Next
r = "XML Error loading " & err.url & " * " & err.reason
Debug.Print r
'show character postion of error; tired of counting on screen
If (err.Line > 0) Then
r = "at line " & err.Line & ", character " & err.linepos & vbCrLf & _
err.srcText & vbCrLf & s & "^"
End If
Debug.Print r
Debug.Print "url=" & err.url & vbCrLf
End Function
Sub CheckBOM(Optional strFileIn As Variant, Optional strIn As Variant)
'checkbom "C:\XML\Gil Encodings\encUTF8_NoDecl.xml"
On Error GoTo Err_handler
Dim strInputData As String * 4
Dim lpBuffer() As Byte
Dim intFreeFile As Integer
If Not IsMissing(strFileIn) Then
intFreeFile = FreeFile
Open strFileIn For Binary Access Read Lock Read As #intFreeFile Len = 4
ReDim lpBuffer(4)
Get #intFreeFile, , lpBuffer
Close #intFreeFile
ElseIf Not IsMissing(strIn) Then
'Can't makes this work since VBA is always converting the string to
UTF-16LE
lpBuffer = Left$(strIn, 4)
Else
MsgBox "Nothing To Do"
Exit Sub
End If
If lpBuffer(0) = 255 And lpBuffer(1) = 254 Then
Debug.Print "File is UTF-16 Little Endian"
ElseIf lpBuffer(0) = 254 And lpBuffer(1) = 255 Then
Debug.Print "File is UTF-16 Big Endian"
ElseIf lpBuffer(0) = 239 And lpBuffer(1) = 187 And lpBuffer(2) = 191 Then
Debug.Print "File is UTF-8"
'Start trying to figure out by other means this will only work on xml
files that start with "<?"
ElseIf lpBuffer(0) = 60 And lpBuffer(1) = 0 And lpBuffer(2) = 63 And
lpBuffer(3) = 0 Then
Debug.Print "File is UTF-16 Little Endian"
ElseIf lpBuffer(0) = 0 And lpBuffer(1) = 60 And lpBuffer(2) = 0 And
lpBuffer(3) = 63 Then
Debug.Print "File is UTF-16 Big Endian"
ElseIf lpBuffer(0) = 69 And lpBuffer(1) = 63 Then
Debug.Print "File can be in UTF-8, ASCII, ISO-8859-?, Shift-JIS, etc"
Else
Debug.Print "Can't seem to figure out the Character encoding"
End If
Err_Exit:
On Error Resume Next
Close #intFreeFile
Exit Sub
Err_handler:
Select Case Err.Number
Case Else
MsgBox Err.Number & " - " & Err.Description
End Select
Resume Err_Exit:
End Sub
Jim DeMarco wrote:
>Darren,
>
>We've been using this:
>
><?xml version="1.0" encoding="iso-8859-1"?>
>
>But I'm pretty sure you can get by with just:
>
><?xml version="1.0"?>
>
>Once you navigate the file below this processing instruction your on
>your own as far as defining elements etc.
>
>May I ask why the concern?
>
>HTH
>
>Jim
>
>-----Original Message-----
>From: accessd-bounces at databaseadvisors.com
>[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Darren DICK
>Sent: Wednesday, March 01, 2006 11:19 PM
>To: AccessD
>Subject: [AccessD] A2000: XML Q
>
>Hello all
>Cross Posted to dba_SQL List
>What is the minimum header information i need to include before 'my
>data' starts to get a 'Well formed' xml doc?
>Eg the stuff that looks like
><?xml version="1.0"?><Report xmlns="Invoice2"
>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>
>
>
>etc etc
>
>
>
>many thanks
>
>DD
>
>--
>AccessD mailing list
>AccessD at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/accessd
>Website: http://www.databaseadvisors.com
>
>
>***********************************************************************************
>"This electronic message is intended to be for the use only of the named recipient, and may contain information from Hudson Health Plan (HHP) that is confidential or privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately, either by contacting the sender at the electronic mail address noted above or calling HHP at (914) 631-1611. If you are not the intended recipient, please do not forward this email to anyone, and delete and destroy all copies of this message. Thank You".
>***********************************************************************************
>
>
>
--
Marty Connelly
Victoria, B.C.
Canada