[AccessD] *&^*%%$ Outlook, &^%HTML, (*&@Computers!!

Bruce Bruen bbruen at bigpond.com
Sun Jun 8 01:35:27 CDT 2003


Hi List,

Does anyone know of a library that will help me parse an email, which is
in BADLY formed html.  I need to find a table in the message, pull the
text from each cell and add the information to a table.

I have tried using linked outlook (and exchange) tables, the message
field only contains the plain text.  One of the info pieces we are
looking for is a hyperlink which M$ conveniently removes.

I have tried using MAPI and CDO libraries. And if I ever meet in a dark
alley the M$ decision maker who put that security misconception
together......

I have tried using Redemption, which lets me get at either the plaintext
or the HTML body of the message fine, but...

Now I've got that far, I am having extreme dificulty with the parsing.

The "ideal" solution would be to have a template per sender identity
that would store the layout of the table in the message (and its
position) and a set of routines that would parse the message, find the
table beast, dig it out and populate the recordset row based on the
template.  

Sounds simple eh?  Here's the complexity:
1.  The tables are in different positions in the message, depending on
how much useless advertising the vendor is sprouting today.
2.  At least the tables are in constant formats!
3.  Depending on the vendor, the HTML of the mail is either fair, poor
or atrocious.  The most common occurrence is unmatched closing tags, for
example "<TR><TD>blah blah<TD>blah1 blah1<TD>blah3</Table>" - fine for
web browser companies with 2.3Gigadevelopers to hack it around but I'm
only one underpaid .....
4. The cells contain more than one attribute.  This bit, I'm OK with, I
can dig out, validate the part# v. description etc with a bit of work.
5. In some cases we need to dig out the tag attributes e.g. a hyperlink.

So, I'm looking for something that I could call that could either
"correct" the html, so I can parse it, or something I could call that
would parse the html bad as it is and return the info for processing
somewhat like the XML parser.

Any ideas?
Bruce



More information about the AccessD mailing list