Drew Wutka
DWUTKA at Marlow.com
Fri Jul 8 14:12:37 CDT 2011
It's based on how indexing, in general, works. If indexing is completely turned off, the search time should be the same either way. I actually built two separate indexing systems. The first was for a PDA version of a system I had, which included a company phone list. The normal application installed on all of our PC's allowed the user to search by first name, last name, or extension number. It was a single textbox search, worked great on the PC version. But when I built it for the palm V... WOW, was slow as dirt. The Palm database didn't have an indexing option. In essence, it was basically a flat file. So what I did, is I created my own index. I had three fields I wanted to search quickly with (First Name, Last Name, extension). I then formatted the system on the Palm to have 6 tables. The main table 3 times, sorted by one key field (so a main table sorted by first name, one sorted by last name, one sorted by extension), then 3 indexing tables, one for each main table. The 'phone list' (at the time) had anywhere from 450 to 900 employees. The data was pretty small though, just text fields, and therefore didn't take up that much space to triplicate it for the palm. (The sorting and table creation were done in process with the Palm synch process). The Index table had 3 fields: The first field was the first letter (which really this was an unnecessary field). So there were 26 records (A thru Z). The Second field was the position in the related main table where that letter started. (for the number index, it obviously had 10 records (with the first field being 0 to 9 (though I only really needed 3 or 4 of those numbers since our extensions were in a specific range))). Then the second field was a text field with a 104 character text field (26x4). Each group of 4 letters (there were 26 of them), represented the hexadecimal value (it may have been decimal, don't remember) of the starting point of the second letter past the first letter's starting point. So, for example, let's say a user searched for Mike, the code would look at the first letter, say 'It's an M, that's the 13th letter (or Asc(UCASE(strFirstLetter))-64), so it would jump to the 13th record in the index table (it was slow to search through each record, but you could jump to a record by it's position very quickly). So the Record in the First Name Index table would look something like this: (oh, if the 4 characters representing the second letter is 0000, then it knows that there are no records for that first/second letter combo... and the numeric representation is 1 off, so 0001 represents the first record of the first letter starting point) M,140,000100000000000000030000000000000010..... So now that I have this record, my code knows that the 9th group for four digits is 0010, and the starting point is 140, it now knows that records starting with MI (for MIKE) start at record 149 (140+10-1(for 1 off offset)). So now the code can jump to the 149th record of the main table (sorted by first name). If the search were to look for Mark (MA for first two letters), it would start at record 140 on the main table (140+1-1). So instead of scanning every record, a slow process taking about 30 seconds on a Palm V, now I am doing a jump to one record, a little math and string reading, and then a jump to a much closer spot to finish the search. So, while my methods/code are probably not identical to indexing in Jet or SQL, it is probably similar, so searching for abc* is able to directly use the index of the record, making very fast jumps through it, where as *xyz, it is going to have to scan every record still. The second indexing system I built was for the old AccessD archives I used to host. I had it hosted with Access as the backend, and posts needed to be able to be searched by word, so a post with a thousand words needed a thousand indexes. SQL allows for full text indexing.... again, my method may not be identical, but the concept is probably similar. What I did for my full text indexing, is as a 'post' came in to be archived, The code would break down each post into individual words. It would then seach a word index (had a table for each starting letter of each word), to see if that word was in the index, if the word exists (say the word is 'AND', it would look in tblAWords), it would take the key of that word, if it didn't exist, it added that word to the index, and then took the newly created key. Then, it would add a record to a tblAWordsToPosts (where 'A' is for search words starting with A, so there were 26 of these tables) with the ID of the search word, and the ID of the post. So if you were to search for 'Unbound Forms', the search would hit tblUWords to get the Key for 'Unbound', then it would hit tblFWords, to get the key for 'Forms', then it would take those two keys, and create a query to return records from tblPosts, where there were joined records with tblUWordsToPosts and tblFWordsToPosts. All in all, with hundreds of thousands of records, the searching of those records were getting done in a second or two, instead of a massively long search going through the memo fields themselves. Drew -----Original Message----- From: accessd-bounces at databaseadvisors.com [mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby Sent: Friday, July 08, 2011 12:25 PM To: Access Developers discussion and problem solving Subject: [AccessD] xyz* faster than *asd Does anyone know of a reason that LIKE is faster with the * in back instead of the front of a string to search by in the where clause? -- John W. Colby www.ColbyConsulting.com -- AccessD mailing list AccessD at databaseadvisors.com http://databaseadvisors.com/mailman/listinfo/accessd Website: http://www.databaseadvisors.com The information contained in this transmission is intended only for the person or entity to which it is addressed and may contain II-VI Proprietary and/or II-VI Business Sensitive material. If you are not the intended recipient, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. You are notified that any review, retransmission, copying, disclosure, dissemination, or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited.