Drew Wutka
DWUTKA at Marlow.com
Mon Jun 2 18:56:08 CDT 2008
Just thought I'd share something I ran into today. I use subqueries a lot. I am putting the finishing touches on that 'online form' system I mentioned a while back. Last week, I finished the 'search' system. To give a little background, this form system allows the user to create their own forms (creating their own fields for the form, etc.). These fields are actually defined within the tables....not like fields in a table, but data in the tables. ie, I have a tblFormFields table, which defines the forms ID, Name, Type (which implies data type), ExtraInfo, etc. I then have a handful of data tables, which hold the data. (Each table is for a specific data type, so the field type determines which table the data is stored in). The data is stored with a DataEntryID (which represents another table, recording when and who is entering the data (also allows for roll back capability)). On top of that, I have a CurrentData table, which holds the DataEntryID and FieldID of the most current data for a Form Entry. Ya, ok, that all may sound complex. It is, and so building a search routine that queries that structure was a lot of fun, though not as difficult as I initially thought. Initial development testing was only done against a handful of records though..and today I am 'load testing' the system. I dumped ~16,000 form entries into the system. Everything was working great, EXCEPT the search feature. Running a search on that many records was taking FOREVER (I was killing the process after about 60 seconds). The query that was I began troubleshooting this with should return 19 records. There are two criteria (one that returns about 280 records, and one that returns about 1500 records, the mix of the two should return 19 records). The query displays a handful of fields from one table (tblFormEntries) which shows the basic info on a form entry (who created the form, when it was created, what form it is form, etc.), along with a few joins to show extra information (like linking the UserID for the form creator to the tblUser table to get their name), and one join for criteria (tying to the tblRootForms table (because this system allows for multiple form revisions)). So that initial part of the query would return ~16k records. I then had two subqueries in the WHERE statement (the one returning 280 records, and the one returning ~1500 records), which tied to the main portion through the FormEntryID. Running the query like that, as I said, was taking forever, longer then I was willing to wait. However, running the subqueries on their own were pretty much instantaneous. So how do I get the Primary query to 'run them first', so that it isn't trying each subquery 16,000 times for each subquery........ A join statement! I'm sure I've read about doing this somewhere, at sometime, but I had never actually tried it. Sure enough, by taking the subqueries OUT of the WHERE statement, and putting them in the FROM clause, whalla, the data I needed was pulled instantenously (which shows promise, since I plan on putting about 160k form entries in for the full load testing!). So, to give an example, let's say we have this table: tblPrimaryData PrimaryKey MorePrimaryData and this table: tblExtraData PrimaryKey SomeOtherKey SomeCriteriaField Now let's say that tblPrimaryData has a million records, and tblExtraData has a few hundred thousand. Let's say that we want a specific SomeOtherKey and specific SomeCriteriaField, and that produces 20 records (and due to proper indexes is instantaneous). We could write our SQL Statement as such: SELECT tblPrimaryData.PrimaryKey, MorePrimaryData FROM (SELECT PrimaryKey FROM tblExtraData WHERE SomeOtherKey=1 AND SomeCriteriaField="Hello World") AS C1 INNER JOIN tblPrimaryData ON C1.PrimaryKey=tblPrimaryData.PrimaryKey And Whalla, Jet is going to run that subquery first, producing the 20 records based on the subqueries criteria, and then it is ONLY going to pull up those 20 records from tblPrimaryData. Obviously the example above could be done with a regular join. But the principle is the same (My subqueries link 2 or 3 tables together to get their results, so they had to be subqueries). You can even then use criteria from the subqueries (WHERE C1.AnotherField=3 (that field would have to be in the subquery), and something else I thought was neat was that Access lets you see a query like that in the Query Builder (it lists the fields that are in the subquery, and would display it's name as C1). Anywho, just thought I'd share! Drew The information contained in this transmission is intended only for the person or entity to which it is addressed and may contain II-VI Proprietary and/or II-VI Business Sensitive material. If you are not the intended recipient, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. You are notified that any review, retransmission, copying, disclosure, dissemination, or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited.