[AccessD] A2000: XML Q

Darren DICK darrend at nimble.com.au
Sat Mar 4 05:57:14 CST 2006


Hi Marty
Thanks for the response
Way over my head
I have forwarded it to my SQL gurus (Soon to be XML gurus I guess)

Many thanks


Darren
------------------------------
T: 0424 696 433
 

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of MartyConnelly
Sent: Saturday, 4 March 2006 7:00 AM
To: Access Developers discussion and problem solving
Subject: Re: [AccessD] A2000: XML Q

Just some thoughts on this
The xml PI ( processing instruction) statement. <?xml version="1.0" 
encoding="iso-8859-1"?>
 is not required if the xml file is UTF-8 or UTF-16, ie. file starts with a
proper BOM

at least for MS XML parsers. I wish MS would spell out, all of it's XML
defaults, They assume developers know by osmosis.

Watch out for encoding statements, it assumes you know where the file is coming
from, if it's not specified then, the parser assumes UTF-8 and or will do a BOM
check. If you add an encoding as above you are stating the file is originally
created  as ANSI "iso-8859-1"  western european.
Any characters outside this range will be non-valid or may not be interpeted
correctly. It used to do funny things to Euro and UK pound symbols.
You can however change the PI on the fly with statements like pi =
xmlDoc.createProcessingInstruction("xml", "version=\"1.0\"");
xmlDoc.insertBefore(pi, xmlDoc.childNodes.item(0));

If you edit an xml file in notepad watch out whether you save as ANSI or
UTF-8 (unicode). May cause grief by changing the BOM In most cases US users will
get away with this type of encoding iso-8859-1, but if you start bringing in
files from international sites or Unix boxes this will give you problems.
See info on xml encodings.
http://www.topxml.com/code/default.asp?p=3&id=v20010810181946

There are quick and dirty ways to bulk change encodings via ADO stream
charset's, I posted some code in the archives.

There is a difference between well-formed and valid XML. Well-formed is a syntax
check on XML (ie. matching tags) Valid also means that XML data entities and
attributes comply with an xml  schema or DTD.

Here is some validation code that might help you out, the special error code
displays the xml character in error I hated trying to count the error line and
position number in a file to determine the character in error.
There is also a routine to check the files BOM marker.

'ValidXMLCheck "C:\XML\Gil Encodings\encUTF8_noBOM.xml"
Sub ValidXMLCheck(strxmlfilepath As String) Dim xmlMessage As
MSXML2.DOMDocument40 Dim oXMLError As IXMLDOMParseError Dim lngErrCode As Long

Set xmlMessage = New MSXML2.DOMDocument40 xmlMessage.async = False
xmlMessage.validateOnParse = True 'true by default xmlMessage.resolveExternals =
False 'Set xmlMessage.schemas = xmlSchema 'After loading the XML document, call
the Validate method of the 'DOMDocument. If there is an error validating against
the schema, there will be a 'parse Error:

 xmlMessage.Load (strxmlfilepath)
lngErrCode = xmlMessage.validate()
If xmlMessage.parseError.errorCode <> 0 Then
    Debug.Print " Reason: " & xmlMessage.parseError.reason
    Set oXMLError = xmlMessage.parseError

            reportParseError oXMLError
    Else
    Debug.Print strxmlfilepath & " file OK"
End If

End Sub

Public Function reportParseError(err As IXMLDOMParseError) 'this is not setup to
count tabs used as whitespace
  Dim s As String
  Dim r As String
  Dim i As Long
 
   s = ""
  For i = 1 To err.linepos - 1
    s = s & " "
  Next
  r = "XML Error loading " & err.url & " * " & err.reason
  Debug.Print r
    'show character postion of error; tired of counting on screen
  If (err.Line > 0) Then
    r = "at line " & err.Line & ", character " & err.linepos & vbCrLf & _
         err.srcText & vbCrLf & s & "^"
  End If
  Debug.Print r
  Debug.Print "url=" & err.url & vbCrLf
  End Function


Sub CheckBOM(Optional strFileIn As Variant, Optional strIn As Variant) 'checkbom
"C:\XML\Gil Encodings\encUTF8_NoDecl.xml"
On Error GoTo Err_handler
Dim strInputData As String * 4
Dim lpBuffer() As Byte
Dim intFreeFile As Integer

  If Not IsMissing(strFileIn) Then
    intFreeFile = FreeFile
    Open strFileIn For Binary Access Read Lock Read As #intFreeFile Len = 4
    ReDim lpBuffer(4)
    Get #intFreeFile, , lpBuffer
    Close #intFreeFile
  ElseIf Not IsMissing(strIn) Then
    'Can't makes this work since VBA is always converting the string to UTF-16LE
    lpBuffer = Left$(strIn, 4)
  Else
    MsgBox "Nothing To Do"
    Exit Sub
  End If
 
  If lpBuffer(0) = 255 And lpBuffer(1) = 254 Then
    Debug.Print "File is UTF-16 Little Endian"
  ElseIf lpBuffer(0) = 254 And lpBuffer(1) = 255 Then
    Debug.Print "File is UTF-16 Big Endian"
  ElseIf lpBuffer(0) = 239 And lpBuffer(1) = 187 And lpBuffer(2) = 191 Then
    Debug.Print "File is UTF-8"
  'Start trying to figure out by other means this will only work on xml files
that start with "<?"
  ElseIf lpBuffer(0) = 60 And lpBuffer(1) = 0 And lpBuffer(2) = 63 And
lpBuffer(3) = 0 Then
    Debug.Print "File is UTF-16 Little Endian"
  ElseIf lpBuffer(0) = 0 And lpBuffer(1) = 60 And lpBuffer(2) = 0 And
lpBuffer(3) = 63 Then
    Debug.Print "File is UTF-16 Big Endian"
  ElseIf lpBuffer(0) = 69 And lpBuffer(1) = 63 Then
    Debug.Print "File can be in UTF-8, ASCII, ISO-8859-?, Shift-JIS, etc"
  Else
    Debug.Print "Can't seem to figure out the Character encoding"
  End If

Err_Exit:
  On Error Resume Next
  Close #intFreeFile
  Exit Sub
Err_handler:
  Select Case Err.Number
  Case Else
    MsgBox Err.Number & " - " & Err.Description
  End Select
  Resume Err_Exit:
End Sub



Jim DeMarco wrote:

>Darren,
>
>We've been using this:
>
><?xml version="1.0" encoding="iso-8859-1"?>
>
>But I'm pretty sure you can get by with just:
>
><?xml version="1.0"?>
>
>Once you navigate the file below this processing instruction your on 
>your own as far as defining elements etc.
>
>May I ask why the concern?
>
>HTH
>
>Jim
>
>-----Original Message-----
>From: accessd-bounces at databaseadvisors.com
>[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Darren DICK
>Sent: Wednesday, March 01, 2006 11:19 PM
>To: AccessD
>Subject: [AccessD] A2000: XML Q
>
>Hello all
>Cross Posted to dba_SQL List
>What is the minimum header information i need to include before 'my 
>data' starts to get a 'Well formed' xml doc?
>Eg the stuff that looks like
><?xml version="1.0"?><Report xmlns="Invoice2"
>xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>
> 
>
>etc etc
>
> 
>
>many thanks
>
>DD
>
>--
>AccessD mailing list
>AccessD at databaseadvisors.com
>http://databaseadvisors.com/mailman/listinfo/accessd
>Website: http://www.databaseadvisors.com
>
>
>***********************************************************************
>************ "This electronic message is intended to be for the use 
>only of the named recipient, and may contain information from Hudson Health
Plan (HHP) that is confidential or privileged.  If you are not the intended
recipient, you are hereby notified that any disclosure, copying, distribution or
use of the contents of this message is strictly prohibited.  If you have
received this message in error or are not the named recipient, please notify us
immediately, either by contacting the sender at the electronic mail address
noted above or calling HHP at (914) 631-1611. If you are not the intended
recipient, please do not forward this email to anyone, and delete and destroy
all copies of this message.  Thank You".
>***********************************************************************
>************
>
>  
>

--
Marty Connelly
Victoria, B.C.
Canada



-- 
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com




More information about the AccessD mailing list