If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Fuzzy Search?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-23-06, 11:18
RogerWilco RogerWilco is offline
Registered User
 
Join Date: Oct 2003
Posts: 268
Fuzzy Search?

How do I do a fuzzy search? If I have a table of full names, I'd like the user to be able to do a search and find the record, "Charles Montgomery Burns" with "Monty Burns" or "Montgomry" (mispelling).

Every major web site does this kind of thing (Amazon, Google, etc).

Someone suggested SOUNDEX, but this really doesn't fit the bill. Misspellings often don't use the same sound signature as the originals. Plus, that doesn't handle multi-word searchable texts very well.

Others have suggested tries or suffix trees. If I went this route, wouldn't I have to preload all data out of the database and into this custom structure upon app startup? Is there any way around that? Also, this solution seems like it would require a lot of dev time (building a custom suffix tree with fuzzy lookup capabilities).

Is there a commonly known and acceptable solution to this?
Reply With Quote
  #2 (permalink)  
Old 01-23-06, 15:24
jfulton jfulton is offline
Registered User
 
Join Date: Apr 2005
Location: Baltimore, MD
Posts: 297
I think what you're looking for is beyond the built-in capabilities of MySQL. It sounds like you want some kind of synonym search? SOUNDEX() won't work...you'll need to harvest data and set up synonym relationships within your database. There must be documents on this - try googling.
Reply With Quote
  #3 (permalink)  
Old 01-24-06, 06:43
RogerWilco RogerWilco is offline
Registered User
 
Join Date: Oct 2003
Posts: 268
Quote:
Originally Posted by jfulton
I think what you're looking for is beyond the built-in capabilities of MySQL. It sounds like you want some kind of synonym search? SOUNDEX() won't work...you'll need to harvest data and set up synonym relationships within your database. There must be documents on this - try googling.
I've actually Googled quite a bit for an answer to this and I haven't found anything. I've even asked the database experts that I know and they don't know.

A synonym relationship wouldn't work. I want to get misspellings which can't be predefined.

I suspect I have to go the custom suffix tree route. Although this is a common feature from the user's perspective, only the big sites have this and I'm sure they don't blink at large dev time investments in this type of thing.
Reply With Quote
  #4 (permalink)  
Old 01-30-06, 08:28
MoonCrawler MoonCrawler is offline
Registered User
 
Join Date: Dec 2005
Location: Arnhem, Gld, NL
Posts: 21
possible in php!

Hi

Though i never tried/used it, i have seen a function in PHP that will help you with this problem.
Hold on,.. i''m searching...

Ha, got it..
It's called the 'Levenstein' method --> http://nl3.php.net/manual/en/function.levenshtein.php

I think that you can use this function to search for a string, and then return results that are close to the input the user gave. This function can make out if a string 'looks like' the one the users gave as a searchterm.

Hope it helps

Last edited by MoonCrawler; 01-30-06 at 08:32.
Reply With Quote
  #5 (permalink)  
Old 01-30-06, 10:23
jfulton jfulton is offline
Registered User
 
Join Date: Apr 2005
Location: Baltimore, MD
Posts: 297
That's definitely a cool and useful function MoonCrawler, but don't you need to have suggestions ready for misspelled words? And then offer the closest match from those suggestions. I guess if the word isn't found in the database, then you can select any words that are similar and then use the levenshtein function to offer the best suggestion. But that only solves half of the problem...

RogerWilco, I still think you may need some sort of synonym table. Getting "Montgomery Burns" from "Montgomry Burns" without one shouldn't be difficult, but how would you associate "Charles Montgomery Burns" with "Monty Burns"? I can't really see how a suffix tree would work for that, or am I missing something?
Reply With Quote
  #6 (permalink)  
Old 10-12-06, 11:42
RogerWilco RogerWilco is offline
Registered User
 
Join Date: Oct 2003
Posts: 268
Quote:
Originally Posted by RogerWilco
How do I do a fuzzy search? If I have a table of full names, I'd like the user to be able to do a search and find the record, "Charles Montgomery Burns" with "Monty Burns" or "Montgomry" (mispelling).

Every major web site does this kind of thing (Amazon, Google, etc).
I solved this a while ago. If anyone stumbles upon this thread and wants to know: use Lucene. It's a full indexing engine and it works really well.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On