[AccessD] Parsing XML as a string?

Thu Nov 16 18:47:35 CST 2006

Grab a book on XPath syntax; quickest to learn via examples

This code grabs the Fee attributes via XPath parsing of the xml dom

 Sub xpathtest()
Dim strxml As String
Dim xmldoc As MSXML2.DOMDocument40
Dim nodes As MSXML2.IXMLDOMNodeList
Dim node  As MSXML2.IXMLDOMNode
Dim xmlError    As IXMLDOMParseError
Dim lngErrCode As Long

Set xmldoc = New MSXML2.DOMDocument40
    xmldoc.setProperty "SelectionLanguage", "XPath"
    xmldoc.async = False
    xmldoc.validateOnParse = False
   'xmldoc.resolveExternals = False
xmldoc.Load "C:\Access files\xmltests\Pria Path\pria.xml"
'
               lngErrCode = xmldoc.validate
           '  Debug.Print lngErrCode
            If xmldoc.parseError.errorCode <> 0 Then
                   Debug.Print " Reason: " & xmldoc.parseError.reason
                  Set xmlError = xmldoc.parseError
                   reportParseError xmlError
            End If
'xpath entity name case sensitive
'grab attributes with fee via XPath
Set nodes = xmldoc.selectNodes("//_FEE/@*")

'Set nodes = xmldoc.selectNodes("/fee/@*")
'Set nodes = xmldoc.selectNodes("//_FEE")
'look through nodes collection
For Each node In nodes
    Debug.Print node.Text & " - " & node.nodeName
Next node

'Debug.Print xmldoc.xml
Set nodes = Nothing
Set xmldoc = Nothing
End Sub

Function reportParseError(err As IXMLDOMParseError)
  Dim s As String
  Dim r As String
  Dim i As Long
   s = ""
  For i = 1 To err.linepos - 1
    s = s & " "
  Next
  r = "XML Error loading " & err.url & " * " & err.reason
  Debug.Print r
    'show character postion of error; tired of counting
  If (err.Line > 0) Then
    r = "at line " & err.Line & ", character " & err.linepos & vbCrLf & _
         err.srcText & vbCrLf & s & "^"
  End If
  Debug.Print r
  End Function

"pria.xml"

<!DOCTYPE REQUEST_GROUP SYSTEM  
"http://iowalandrecords.org/portal/dtd/CLRIS_PRIA_Request.dtd">
<REQUEST_GROUP>
  <REQUESTING_PARTY/>
   <RECEIVING_PARTY/>
    <SUBMITTING_PARTY/>
     <REQUEST>
                <KEY _Name="SubmissionID" _Value="4281"/>
                <KEY _Name="GroupName" _Value="JFB_Exemption20060831"/>
                <KEY _Name="SubmissionNumber" _Value="000000061000004281"/>

<PRIA_REQUEST _Type="Other" _TypeOtherDescription="ESubmission">
          <PACKAGE>
            <PRIA_DOCUMENT _Code="" _NonRecordableIndicator="N" 
_Type="Deed" _InstrumentDate="20060831" 
_CountyOfRecordationName="Scott"  _StateOfRecordationName="IA">
            <GRANTOR _FirstName="First" _MiddleName="mi" 
_LastName="Last" _NameSuffix="" _UnparsedName="Last First mi" 
_Capacity="" NonPersonEntityIndicator="N"/>
            <GRANTEE _Capacity="" _NonPersonEntityIndicator="N"/>
            <PARTIES>
            <_RETURN_TO_PARTY>
            <NON_PERSON_ENTITY_DETAIL/>
            <CONTACT_DETAIL/>
            </_RETURN_TO_PARTY>
            </PARTIES>
            <EXECUTION/>
            <RECORDING_ENDORSEMENT>
            <_VOLUME_PAGE/>
            <_FEES _TotalAmount="31.00">
            <_FEE _Amount="15.00" _Description="StandardFee"/>
            <_FEE _Amount="3.00" _Description="DocMgmtFee"/>
            <_FEE _Amount="3.00" _Description="ERecordingFee"/>
            <_FEE _Amount="10.00" _Description="TransferFee"/>
            <_FEE _Amount="0.00" _Description="TransferTax"/>
            </_FEES>
            <_EXEMPTIONS _Description="Deed fulfilling contract"/>
            </RECORDING_ENDORSEMENT>
            <EMBEDDED_FILE _Description="" _EncodingType="B64Encode" 
_ID="4421" _Name="Cert.pdf" _NumberOfPages="1">
            <DOCUMENT>-BLAH  - IMAGE FILE - BLAH BLAH ==</DOCUMENT>
           </EMBEDDED_FILE>
          </PRIA_DOCUMENT>
         </PACKAGE>
        </PRIA_REQUEST>
       </REQUEST>
</REQUEST_GROUP>

Greg Smith wrote:

>John:
>
>Yes...I want a lot of the enclosed < /> fields, but I also need them with
>respect to their description, such as the FEES, of which (in this example)
>there are 5, each with it's own description "StandardFee", etc.  So I'll
>not only need to get the data, such as the fee, but I have to be able to
>delineate which fee it is so it can go in the right table field.  For all
>practical purposes, I can ignore anything after the start of the
><DOCUMENT> because there's no data past that point.
>
>And you are also correct in that this doesn't fit today's standard xml
>format.  I've even asked the people who are sending it to me just WHAT
>type of xml is it, but not gotten an answer...which makes me believe they
>don't know either.  It works for what they want, so don't mess with it I
>guess...and, of course, out of 99 counties, mine is the ONLY one using an
>Access db program to do the Recorder's work (so far... :)).
>
>You'll note that some of the Elements (Fee) have two attributes, whereas
>some of them (Grantor) have many more.  So a rule that says only get the
>two would not work in all cases.  Although maybe breaking it down first by
>the < /> and then looking inside each of those...
>
>Greg
>
>
>  
>
>>OK, so I assume that you want to get the items, enclosed by < /> where
>>there are two "values" separated by a space?
>>
>>You want all of them?
>>Just specific ones?
>>Are there more than one of the "big items" defined as the entire thing
>>you sent me) in a single file?
>>
>>This looks trivial to parse based on the <> pairs as beginning / ending
>>a field.  This assumes that neither of these characters are found in the
>>image data.
>>
>>I am not intimate with XML, but I thought that XML had <FieldName> data
>></FieldName>.  This obviously doesn't.
>>
>>John W. Colby
>>Colby Consulting
>>www.ColbyConsulting.com
>>
>>-----Original Message-----
>>From: accessd-bounces at databaseadvisors.com
>>[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Greg Smith
>>Sent: Thursday, November 16, 2006 12:09 PM
>>To: accessd at databaseadvisors.com
>>Subject: Re: [AccessD] Parsing XML as a string?
>>
>>John:
>>
>>I sent you a full copy of the XML file offline.
>>
>>Greg
>>
>>    
>>
>>>Not having followed the original thread...
>>>
>>>It sounds like a good place for a pair of classes.  One class would
>>>hold each "snippet" based on the < characters.  A parent class would
>>>break down the string into these snippets, load them into the snippet
>>>classes and hold the snippet classes in a collection.  Once the huge
>>>string is parsed into snippets, the parent class can process them by
>>>iterating the collection of snippets doing whatever was required for
>>>each
>>>      
>>>
>>snippet.
>>    
>>
>>>Once you have processed the snippets, you can write the results out to
>>> a table.
>>>
>>>That is obviously a "big picture".
>>>
>>>Can you paste a sample of the xml into an email so that I can see it.
>>>Sorry, I wasn't following the original discussion.
>>>
>>>John W. Colby
>>>Colby Consulting
>>>www.ColbyConsulting.com
>>>
>>>-----Original Message-----
>>>From: accessd-bounces at databaseadvisors.com
>>>[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Greg Smith
>>>Sent: Thursday, November 16, 2006 10:46 AM
>>>To: accessd at databaseadvisors.com
>>>Subject: [AccessD] Parsing XML as a string?
>>>
>>>Hi everyone!
>>>
>>>Ok...I admit that trying to import that XML file I had directly into
>>>Access may have SEEMED like a good, "easy", idea...at the time...but
>>>after looking around and from the comments here, the idea
>>>was...well..it sucked.
>>> If the XML they were sending to me were compatible then I might have
>>>had a chance...but it's just not feasible.  There actually wasn't any
>>>way to define it using a dtd/xls/xlst within my lifetime, so I'm going
>>> to have to use a different approach.
>>>
>>>The files they send as XML are not that large, so I could easily
>>>import them as text, separate out what I need and put it into the
>>>required tables. However, since they send it as a single string, it
>>>becomes harder to parse it since there are multiple duplicated 'keys'
>>>that I need to pull from it. And they're not necessarily in the same
>>>position all of the time.
>>>
>>>I could import it as a single string into a memo field, but I can't
>>>figure out how to disect a memo field string like that.
>>>
>>>When I import it as text, I could break it down at the "<" characters,
>>> importing each one into a separate columns, but I need them in rows,
>>>not columns, to search and find the strings of data I need.
>>>
>>>So, in summary, my only two choices (that I can think of) are:
>>>
>>>1.  Import the XML as a single string into a memo and somehow parse
>>>that into the data I need.
>>>2.  Import the XML as text, separating it on the "<" characters into
>>>columns, then somehow magically (transpose columns into rows?)
>>>transform that to usable information.
>>>
>>>ANY suggestions, short of retirement (although not a bad idea...),
>>>would be GREATLY apprecaited!
>>>
>>>Thanks!
>>>
>>>Greg Smith
>>>
>>>      
>>>
>  
>

-- 
Marty Connelly
Victoria, B.C.
Canada