[AccessD] A2000: XML Q

MartyConnelly martyconnelly at shaw.ca
Fri Mar 3 13:59:45 CST 2006


Just some thoughts on this
The xml PI ( processing instruction) statement. <?xml version="1.0" 
encoding="iso-8859-1"?>
 is not required if the xml file is UTF-8 or UTF-16, ie. file starts 
with a proper BOM

at least for MS XML parsers. I wish MS would spell out, all of it's XML defaults, 
They assume developers know by osmosis.

Watch out for encoding statements, it assumes you know where the file is 
coming from, if it's not specified then,
the parser assumes UTF-8 and or will do a BOM check. If you add an 
encoding as above you are stating the file is
originally created  as ANSI "iso-8859-1"  western european.
Any characters outside this range will be non-valid or may not be 
interpeted correctly. It used to do funny things to Euro and UK pound 
symbols.
You can however change the PI on the fly with statements like
pi = xmlDoc.createProcessingInstruction("xml", "version=\"1.0\"");
xmlDoc.insertBefore(pi, xmlDoc.childNodes.item(0));

If you edit an xml file in notepad watch out whether you save as ANSI or 
UTF-8 (unicode). May cause grief by changing the BOM
In most cases US users will get away with this type of encoding 
iso-8859-1, but if you start bringing in files from international sites 
or Unix boxes
this will give you problems.
See info on xml encodings.
http://www.topxml.com/code/default.asp?p=3&id=v20010810181946

There are quick and dirty ways to bulk change encodings via ADO stream 
charset's, I posted some code in the archives.

There is a difference between well-formed and valid XML. Well-formed is 
a syntax check on XML (ie. matching tags)
Valid also means that XML data entities and attributes comply with an 
xml  schema or DTD.

Here is some validation code that might help you out, the special error 
code displays the xml character in error
I hated trying to count the error line and position number in a file to 
determine the character in error.
There is also a routine to check the files BOM marker.

'ValidXMLCheck "C:\XML\Gil Encodings\encUTF8_noBOM.xml"
Sub ValidXMLCheck(strxmlfilepath As String)
Dim xmlMessage As MSXML2.DOMDocument40
Dim oXMLError As IXMLDOMParseError
Dim lngErrCode As Long

Set xmlMessage = New MSXML2.DOMDocument40
xmlMessage.async = False
xmlMessage.validateOnParse = True 'true by default
xmlMessage.resolveExternals = False
'Set xmlMessage.schemas = xmlSchema
'After loading the XML document, call the Validate method of the
'DOMDocument. If there is an error validating against the schema, there 
will be a
'parse Error:

 xmlMessage.Load (strxmlfilepath)
lngErrCode = xmlMessage.validate()
If xmlMessage.parseError.errorCode <> 0 Then
    Debug.Print " Reason: " & xmlMessage.parseError.reason
    Set oXMLError = xmlMessage.parseError

            reportParseError oXMLError
    Else
    Debug.Print strxmlfilepath & " file OK"
End If

End Sub

Public Function reportParseError(err As IXMLDOMParseError)
'this is not setup to count tabs used as whitespace
  Dim s As String
  Dim r As String
  Dim i As Long
 
   s = ""
  For i = 1 To err.linepos - 1
    s = s & " "
  Next
  r = "XML Error loading " & err.url & " * " & err.reason
  Debug.Print r
    'show character postion of error; tired of counting on screen
  If (err.Line > 0) Then
    r = "at line " & err.Line & ", character " & err.linepos & vbCrLf & _
         err.srcText & vbCrLf & s & "^"
  End If
  Debug.Print r
  Debug.Print "url=" & err.url & vbCrLf
  End Function


Sub CheckBOM(Optional strFileIn As Variant, Optional strIn As Variant)
'checkbom "C:\XML\Gil Encodings\encUTF8_NoDecl.xml"
On Error GoTo Err_handler
Dim strInputData As String * 4
Dim lpBuffer() As Byte
Dim intFreeFile As Integer

  If Not IsMissing(strFileIn) Then
    intFreeFile = FreeFile
    Open strFileIn For Binary Access Read Lock Read As #intFreeFile Len = 4
    ReDim lpBuffer(4)
    Get #intFreeFile, , lpBuffer
    Close #intFreeFile
  ElseIf Not IsMissing(strIn) Then
    'Can't makes this work since VBA is always converting the string to 
UTF-16LE
    lpBuffer = Left$(strIn, 4)
  Else
    MsgBox "Nothing To Do"
    Exit Sub
  End If
 
  If lpBuffer(0) = 255 And lpBuffer(1) = 254 Then
    Debug.Print "File is UTF-16 Little Endian"
  ElseIf lpBuffer(0) = 254 And lpBuffer(1) = 255 Then
    Debug.Print "File is UTF-16 Big Endian"
  ElseIf lpBuffer(0) = 239 And lpBuffer(1) = 187 And lpBuffer(2) = 191 Then
    Debug.Print "File is UTF-8"
  'Start trying to figure out by other means this will only work on xml 
files that start with "<?"
  ElseIf lpBuffer(0) = 60 And lpBuffer(1) = 0 And lpBuffer(2) = 63 And 
lpBuffer(3) = 0 Then
    Debug.Print "File is UTF-16 Little Endian"
  ElseIf lpBuffer(0) = 0 And lpBuffer(1) = 60 And lpBuffer(2) = 0 And 
lpBuffer(3) = 63 Then
    Debug.Print "File is UTF-16 Big Endian"
  ElseIf lpBuffer(0) = 69 And lpBuffer(1) = 63 Then
    Debug.Print "File can be in UTF-8, ASCII, ISO-8859-?, Shift-JIS, etc"
  Else
    Debug.Print "Can't seem to figure out the Character encoding"
  End If

Err_Exit:
  On Error Resume Next
  Close #intFreeFile
  Exit Sub
Err_handler:
  Select Case Err.Number
  Case Else
    MsgBox Err.Number & " - " & Err.Description
  End Select
  Resume Err_Exit:
End Sub



Jim DeMarco wrote:

>Darren,
>
>We've been using this:
>
><?xml version="1.0" encoding="iso-8859-1"?>
>
>But I'm pretty sure you can get by with just:
>
><?xml version="1.0"?> 
>
>Once you navigate the file below this processing instruction your on
>your own as far as defining elements etc.
>
>May I ask why the concern?
>
>HTH
>
>Jim
>
>-----Original Message-----
>From: accessd-bounces at databaseadvisors.com
>[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Darren DICK
>Sent: Wednesday, March 01, 2006 11:19 PM
>To: AccessD
>Subject: [AccessD] A2000: XML Q
>
>Hello all
>Cross Posted to dba_SQL List
>What is the minimum header information i need to include before 'my
>data' starts to get a 'Well formed' xml doc?
>Eg the stuff that looks like
><?xml version="1.0"?><Report xmlns="Invoice2"
>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>
> 
>
>etc etc
>
> 
>
>many thanks
>
>DD
>
>--
>AccessD mailing list
>AccessD at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/accessd
>Website: http://www.databaseadvisors.com
>
>
>***********************************************************************************
>"This electronic message is intended to be for the use only of the named recipient, and may contain information from Hudson Health Plan (HHP) that is confidential or privileged.  If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited.  If you have received this message in error or are not the named recipient, please notify us immediately, either by contacting the sender at the electronic mail address noted above or calling HHP at (914) 631-1611. If you are not the intended recipient, please do not forward this email to anyone, and delete and destroy all copies of this message.  Thank You".
>***********************************************************************************
>
>  
>

-- 
Marty Connelly
Victoria, B.C.
Canada






More information about the AccessD mailing list