Drew Wutka
DWUTKA at Marlow.com
Mon Sep 15 12:01:37 CDT 2008
I've got a little different approach for you to try. Instead of running every record through code, to compare to every other record A slightly different approach would be to break down each title into words. So, let's say tblTitles is your table with titles, with TitleID (long integer ID), Title as the field with the title, and add a field called WordCount (long integer). Then create tblWords (WordID (Autonumber), and Word (text)), and tblWordsInTitles (TitleID, WordID). Then you would run the following functions (actually, run GetWordsFromTitles, RemovePunctuation is used inside that function): Function RemovePunctuation(ByVal strWord As String) As String Dim intChar As Long intChar = Asc(Right(strWord, 1)) Do Until intChar >= 65 And intChar <= 90 If Len(strWord) >= 1 Then RemovePunctuation = "" Exit Function End If strWord = Left(strWord, Len(strWord) - 1) intChar = Asc(Right(strWord, 1)) Loop RemovePunctuation = strWord End Function Function GetWordsFromTitles() Dim rs As ADODB.Recordset Dim strSQL As String Dim rsWord As ADODB.Recordset Dim strArray() As String Dim i As Long Dim intWordCount As Long Dim strWord As String strSQL = "SELECT TitleID, Title, WordCount FROM tblTitles WHERE WordCount=0" Set rs = New ADODB.Recordset rs.Open strSQL, CurrentProject.Connection, adOpenKeyset, adLockOptimistic If rs.EOF = False Then rs.MoveFirst Do Until rs.EOF = True strArray = Split(rs.Fields(1).Value) intWordCount = 0 For i = 0 To UBound(strArray) strWord = RemovePunctuation(UCase(strArray(i))) If strWord <> "" Then strSQL = "SELECT WordID FROM tblWords WHERE Word=""" & strWord & """" Set rsWord = New ADODB.Recordset rsWord.Open strSQL, CurrentProject.Connection, adOpenKeyset, adLockReadOnly If rsWord.EOF = True And rsWord.BOF = True Then rsWord.Close Set rsWord = Nothing Set rsWord = New ADODB.Recordset rsWord.Open "tblWords", CurrentProject.Connection, adOpenKeyset, adLockOptimistic rsWord.AddNew rsWord.Fields("Word").Value = strWord rsWord.Update Else rsWord.MoveFirst End If strSQL = "INSERT INTO tblWordsInTitles ([TitleID],[WordID]) VALUES(" & rs.Fields(0).Value & "," & rsWord.Fields("WordID").Value & ")" rsWord.Close Set rsWord = Nothing DoCmd.SetWarnings False DoCmd.RunSQL strSQL DoCmd.SetWarnings True intWordCount = intWordCount + 1 End If Next i rs.Fields("WordCount").Value = intWordCount rs.Update rs.MoveNext Loop rs.Close Set rs = Nothing MsgBox "Done" End Function Now you have a table with all the words from your titles, and you have a table with links between the words and the titles. You can then do a 'reverse lookup' of sorts, based on the commonality of the words, and the number of words. For example: SELECT tblTitles.TitleID, tblTitles.Title FROM tblWordsInTitles AS tblWordsInTitles_1 INNER JOIN (tblWordsInTitles INNER JOIN tblTitles ON tblWordsInTitles.TitleID = tblTitles.TitleID) ON tblWordsInTitles_1.WordID = tblWordsInTitles.WordID WHERE (((tblTitles.TitleID)<>1) AND ((tblWordsInTitles_1.TitleID)=1)) GROUP BY tblTitles.TitleID, tblTitles.Title HAVING (((Count(tblTitles.TitleID))>(0.8*4))); Would display any common titles with at least 80% of the words in the title with ID 1 (the WHERE clause designates the ID for the original title) and the HAVING clause is the .8 (80%) of 4 (WordCount for record with ID of 1)). If you need a running record of matched titles, then you would just have to run this query against each record. Now, this is in no way a perfect example, you may need to tweak things a bit. Maybe tweak it so that it ignores common words like in, the, a, etc. or catches words that are plurals of existing words. The advantage of this approach would be if you need to list 'common titles' on the fly. The code above would need to be run whenever new titles are added, but it'll only run against new titles (with WordCount=0), so it might take a minute on your first 18k records, but from then on it's a second to run for a handful of records. And any querying against the 'reverse word lookup' would be pretty quick since it's all indexed numbers. Just a thought. Drew The information contained in this transmission is intended only for the person or entity to which it is addressed and may contain II-VI Proprietary and/or II-VI Business Sensitive material. If you are not the intended recipient, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. You are notified that any review, retransmission, copying, disclosure, dissemination, or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited.