[AccessD] xyz* faster than *asd

Fri Jul 8 14:12:37 CDT 2011

It's based on how indexing, in general, works.  If indexing is
completely turned off, the search time should be the same either way.

I actually built two separate indexing systems.

The first was for a PDA version of a system I had, which included a
company phone list.  The normal application installed on all of our PC's
allowed the user to search by first name, last name, or extension
number.  It was a single textbox search, worked great on the PC version.
But when I built it for the palm V... WOW, was slow as dirt.

The Palm database didn't have an indexing option. In essence, it was
basically a flat file.

So what I did, is I created my own index.  I had three fields I wanted
to search quickly with (First Name, Last Name, extension).

I then formatted the system on the Palm to have 6 tables.  The main
table 3 times, sorted by one key field (so a main table sorted by first
name, one sorted by last name, one sorted by extension), then 3 indexing
tables, one for each main table.

The 'phone list' (at the time) had anywhere from 450 to 900 employees.
The data was pretty small though, just text fields, and therefore didn't
take up that much space to triplicate it for the palm. (The sorting and
table creation were done in process with the Palm synch process).

The Index table had 3 fields:

The first field was the first letter (which really this was an
unnecessary field).  So there were 26 records (A thru Z).  The Second
field was the position in the related main table where that letter
started. (for the number index, it obviously had 10 records (with the
first field being 0 to 9 (though I only really needed 3 or 4 of those
numbers since our extensions were in a specific range))).  Then the
second field was a text field with a 104 character text field (26x4).
Each group of 4 letters (there were 26 of them), represented the
hexadecimal value (it may have been decimal, don't remember) of the
starting point of the second letter past the first letter's starting
point.

So, for example, let's say a user searched for Mike, the code would look
at the first letter, say 'It's an M, that's the 13th letter (or
Asc(UCASE(strFirstLetter))-64), so it would jump to the 13th record in
the index table (it was slow to search through each record, but you
could jump to a record by it's position very quickly).  So the Record in
the First Name Index table would look something like this: (oh, if the 4
characters representing the second letter is 0000, then it knows that
there are no records for that first/second letter combo... and the
numeric representation is 1 off, so 0001 represents the first record of
the first letter starting point)

M,140,000100000000000000030000000000000010.....

So now that I have this record, my code knows that the 9th group for
four digits is 0010, and the starting point is 140, it now knows that
records starting with MI (for MIKE) start at record 149 (140+10-1(for 1
off offset)).  So now the code can jump to the 149th record of the main
table (sorted by first name).  If the search were to look for Mark (MA
for first two letters), it would start at record 140 on the main table
(140+1-1).  So instead of scanning every record, a slow process taking
about 30 seconds on a Palm V, now I am doing a jump to one record, a
little math and string reading, and then a jump to a much closer spot to
finish the search.  

So, while my methods/code are probably not identical to indexing in Jet
or SQL, it is probably similar, so searching for abc* is able to
directly use the index of the record, making very fast jumps through it,
where as *xyz, it is going to have to scan every record still.

The second indexing system I built was for the old AccessD archives I
used to host. I had it hosted with Access as the backend, and posts
needed to be able to be searched by word, so a post with a thousand
words needed a thousand indexes.  SQL allows for full text indexing....
again, my method may not be identical, but the concept is probably
similar.

What I did for my full text indexing, is as a 'post' came in to be
archived, The code would break down each post into individual words.  It
would then seach a word index (had a table for each starting letter of
each word), to see if that word was in the index, if the word exists
(say the word is 'AND', it would look in tblAWords), it would take the
key of that word, if it didn't exist, it added that word to the index,
and then took the newly created key.  Then, it would add a record to a
tblAWordsToPosts (where 'A' is for search words starting with A, so
there were 26 of these tables) with the ID of the search word, and the
ID of the post.

So if you were to search for 'Unbound Forms', the search would hit
tblUWords to get the Key for 'Unbound', then it would hit tblFWords, to
get the key for 'Forms', then it would take those two keys, and create a
query to return records from tblPosts, where there were joined records
with tblUWordsToPosts and tblFWordsToPosts.  All in all, with hundreds
of thousands of records, the searching of those records were getting
done in a second or two, instead of a massively long search going
through the memo fields themselves.

Drew

-----Original Message-----
From: accessd-bounces at databaseadvisors.com
[mailto:accessd-bounces at databaseadvisors.com] On Behalf Of jwcolby
Sent: Friday, July 08, 2011 12:25 PM
To: Access Developers discussion and problem solving
Subject: [AccessD] xyz* faster than *asd

Does anyone know of a reason that LIKE is faster with the * in back
instead of the front of a string 
to search by in the where clause?

-- 
John W. Colby
www.ColbyConsulting.com
-- 
AccessD mailing list
AccessD at databaseadvisors.com
http://databaseadvisors.com/mailman/listinfo/accessd
Website: http://www.databaseadvisors.com
The information contained in this transmission is intended only for the person or entity 
to which it is addressed and may contain II-VI Proprietary and/or II-VI Business 
Sensitive material. If you are not the intended recipient, please contact the sender 
immediately and destroy the material in its entirety, whether electronic or hard copy. 
You are notified that any review, retransmission, copying, disclosure, dissemination, 
or other use of, or taking of any action in reliance upon this information by persons 
or entities other than the intended recipient is prohibited.