[AccessD] Gather Information from Website

Bruce Bruen bbruen at bigpond.com
Wed Jun 11 09:03:51 CDT 2003


Dear Rich,

Just been there, done that!

Here a few things to consider, that I have learnt over the last few days
regarding this type of info gathering.

1. You'll need an algorithm to parse the html (or text) received to "dig
out" the info you need.  This is not trivial.  Each page we used needed
a completely different method of finding the data in the chaff.  As per
the great advice received from Shamil (bless his socks) I based what we
did on the mshtml.tlb method of parsing the actual html as this gave me
the best way of finding the table I wanted and the best way of handling
the "multi-record" data as each chunk of info we needed was in a
<tr></tr> pair.  Also, the html out there in gagaland is NOT standard
and (so far) mshtml has coped with the (even worse) html we get in
auto-emails.  All the parsing we did was CODED specific to the page or
email layout.  If you come up with a parameterised general parser I'd be
glad to see it!

2. Beware of frames!  Some sites may need a fair bit of digging to get
to the actual html page you want.

3. As Marty Connoly warned - the buggers will change the site layout, or
some chunk you are depending on to identify the data just when you don't
want it to change.

4. Beware of page caching!  If your coming through a proxy, or a seedy
ISP you need to ensure that the page you get is the one you are after.
And come up with a way of forcing the page refresh!  We haven't solved
this one yet!  Other wise you just have to wait until you can get
"today's" page.  (This blew about 16 hours of budget - until one of the
notwork guys informed us that the local proxy was set to "permanently"
cache all pages accessed that weren't on their special list)

5.  Shamil's (that Russian guy :-) ) excellent stuff is here
http://smsconsulting.spb.ru/shamil_s/topics/tableparser.htm

6. The way to get the page is via the MSXML libs msxml4.dll (thanks to
Marty) viz...

Public Function UpdateSheet(addr As String) As String
    Dim strPageText As String
    Dim httpdoc As String
    Dim xmlhttp
    
    On Error GoTo US_ERR
    
    Set xmlhttp = CreateObject("Microsoft.XMLHTTP") 'Create the xmlhttp
object
    Call xmlhttp.Open("GET", addr, False)           'Open a connection
    Call xmlhttp.send                               'send a request to
the server
    httpdoc = xmlhttp.responseText                  'grab the response
as text
    strPageText = ProcessHTML(httpdoc)              'perform some
general preprocessing (mine removes <br>'s and other junk)
    
    UpdateSheet = strPageText

US_EXIT:
    Set xmlhttp = Nothing
    Exit Function

US_ERR:
  .....your favourite error handler
End Function

Hth
Bruce


-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Lavsa, Rich
Sent: Wednesday, June 11, 2003 10:56 PM
To: 'accessd at databaseadvisors.com'
Subject: [AccessD] Gather Information from Website


Hello all,

currently I am not developing this application, however since the topic
of web pages in access came up I thought that maybe I'd throw out a bone
to see if I can gain any knowledge before I attempt this on my own.

My goal is to come up for a friend of mine that runs 3 to 4 fantasy
baseball and football leagues every year and is thinking of getting into
hockey next. he spends all his time looking on the internet and local
paper to get stats they he puts into an excel spreadsheet manually just
about every day.  There is a large possibility that he could miss
something, or add his points system wrong, or simply enter it in the
wrong cell.  My thought was that this information is already entered in
databases, and is displayed on hundreds of websites.  So I was thinking
that since this information is already out there, why can't you just
capture the stats every night.  Of course there would be some work on
setting up the database, setting up the rules of each league, however
thought it might be a fun project and the more I think about it the more
work I think it will be.

So to my question... 
Is it possible to navigate out to a website for Stocks or Sport Stats,
download the information from the website to your database? Has anyone
out there done anything like this?

Thanks in advance,
Rich
_______________________________________________
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com



More information about the AccessD mailing list