Bruce Bruen
bbruen at bigpond.com
Wed Jun 11 09:03:51 CDT 2003
Dear Rich, Just been there, done that! Here a few things to consider, that I have learnt over the last few days regarding this type of info gathering. 1. You'll need an algorithm to parse the html (or text) received to "dig out" the info you need. This is not trivial. Each page we used needed a completely different method of finding the data in the chaff. As per the great advice received from Shamil (bless his socks) I based what we did on the mshtml.tlb method of parsing the actual html as this gave me the best way of finding the table I wanted and the best way of handling the "multi-record" data as each chunk of info we needed was in a <tr></tr> pair. Also, the html out there in gagaland is NOT standard and (so far) mshtml has coped with the (even worse) html we get in auto-emails. All the parsing we did was CODED specific to the page or email layout. If you come up with a parameterised general parser I'd be glad to see it! 2. Beware of frames! Some sites may need a fair bit of digging to get to the actual html page you want. 3. As Marty Connoly warned - the buggers will change the site layout, or some chunk you are depending on to identify the data just when you don't want it to change. 4. Beware of page caching! If your coming through a proxy, or a seedy ISP you need to ensure that the page you get is the one you are after. And come up with a way of forcing the page refresh! We haven't solved this one yet! Other wise you just have to wait until you can get "today's" page. (This blew about 16 hours of budget - until one of the notwork guys informed us that the local proxy was set to "permanently" cache all pages accessed that weren't on their special list) 5. Shamil's (that Russian guy :-) ) excellent stuff is here http://smsconsulting.spb.ru/shamil_s/topics/tableparser.htm 6. The way to get the page is via the MSXML libs msxml4.dll (thanks to Marty) viz... Public Function UpdateSheet(addr As String) As String Dim strPageText As String Dim httpdoc As String Dim xmlhttp On Error GoTo US_ERR Set xmlhttp = CreateObject("Microsoft.XMLHTTP") 'Create the xmlhttp object Call xmlhttp.Open("GET", addr, False) 'Open a connection Call xmlhttp.send 'send a request to the server httpdoc = xmlhttp.responseText 'grab the response as text strPageText = ProcessHTML(httpdoc) 'perform some general preprocessing (mine removes <br>'s and other junk) UpdateSheet = strPageText US_EXIT: Set xmlhttp = Nothing Exit Function US_ERR: .....your favourite error handler End Function Hth Bruce -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of Lavsa, Rich Sent: Wednesday, June 11, 2003 10:56 PM To: 'accessd at databaseadvisors.com' Subject: [AccessD] Gather Information from Website Hello all, currently I am not developing this application, however since the topic of web pages in access came up I thought that maybe I'd throw out a bone to see if I can gain any knowledge before I attempt this on my own. My goal is to come up for a friend of mine that runs 3 to 4 fantasy baseball and football leagues every year and is thinking of getting into hockey next. he spends all his time looking on the internet and local paper to get stats they he puts into an excel spreadsheet manually just about every day. There is a large possibility that he could miss something, or add his points system wrong, or simply enter it in the wrong cell. My thought was that this information is already entered in databases, and is displayed on hundreds of websites. So I was thinking that since this information is already out there, why can't you just capture the stats every night. Of course there would be some work on setting up the database, setting up the rules of each league, however thought it might be a fun project and the more I think about it the more work I think it will be. So to my question... Is it possible to navigate out to a website for Stocks or Sport Stats, download the information from the website to your database? Has anyone out there done anything like this? Thanks in advance, Rich _______________________________________________ AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com