Results 1 to 6 of 6

Thread: Fuzzy Search?

  1. #1
    Join Date
    Oct 2003
    Posts
    268

    Unanswered: Fuzzy Search?

    How do I do a fuzzy search? If I have a table of full names, I'd like the user to be able to do a search and find the record, "Charles Montgomery Burns" with "Monty Burns" or "Montgomry" (mispelling).

    Every major web site does this kind of thing (Amazon, Google, etc).

    Someone suggested SOUNDEX, but this really doesn't fit the bill. Misspellings often don't use the same sound signature as the originals. Plus, that doesn't handle multi-word searchable texts very well.

    Others have suggested tries or suffix trees. If I went this route, wouldn't I have to preload all data out of the database and into this custom structure upon app startup? Is there any way around that? Also, this solution seems like it would require a lot of dev time (building a custom suffix tree with fuzzy lookup capabilities).

    Is there a commonly known and acceptable solution to this?

  2. #2
    Join Date
    Apr 2005
    Location
    Baltimore, MD
    Posts
    297
    I think what you're looking for is beyond the built-in capabilities of MySQL. It sounds like you want some kind of synonym search? SOUNDEX() won't work...you'll need to harvest data and set up synonym relationships within your database. There must be documents on this - try googling.

  3. #3
    Join Date
    Oct 2003
    Posts
    268
    Quote Originally Posted by jfulton
    I think what you're looking for is beyond the built-in capabilities of MySQL. It sounds like you want some kind of synonym search? SOUNDEX() won't work...you'll need to harvest data and set up synonym relationships within your database. There must be documents on this - try googling.
    I've actually Googled quite a bit for an answer to this and I haven't found anything. I've even asked the database experts that I know and they don't know.

    A synonym relationship wouldn't work. I want to get misspellings which can't be predefined.

    I suspect I have to go the custom suffix tree route. Although this is a common feature from the user's perspective, only the big sites have this and I'm sure they don't blink at large dev time investments in this type of thing.

  4. #4
    Join Date
    Dec 2005
    Location
    Arnhem, Gld, NL
    Posts
    21

    possible in php!

    Hi

    Though i never tried/used it, i have seen a function in PHP that will help you with this problem.
    Hold on,.. i''m searching...

    Ha, got it..
    It's called the 'Levenstein' method --> http://nl3.php.net/manual/en/function.levenshtein.php

    I think that you can use this function to search for a string, and then return results that are close to the input the user gave. This function can make out if a string 'looks like' the one the users gave as a searchterm.

    Hope it helps
    Last edited by MoonCrawler; 01-30-06 at 09:32.

  5. #5
    Join Date
    Apr 2005
    Location
    Baltimore, MD
    Posts
    297
    That's definitely a cool and useful function MoonCrawler, but don't you need to have suggestions ready for misspelled words? And then offer the closest match from those suggestions. I guess if the word isn't found in the database, then you can select any words that are similar and then use the levenshtein function to offer the best suggestion. But that only solves half of the problem...

    RogerWilco, I still think you may need some sort of synonym table. Getting "Montgomery Burns" from "Montgomry Burns" without one shouldn't be difficult, but how would you associate "Charles Montgomery Burns" with "Monty Burns"? I can't really see how a suffix tree would work for that, or am I missing something?

  6. #6
    Join Date
    Oct 2003
    Posts
    268
    Quote Originally Posted by RogerWilco
    How do I do a fuzzy search? If I have a table of full names, I'd like the user to be able to do a search and find the record, "Charles Montgomery Burns" with "Monty Burns" or "Montgomry" (mispelling).

    Every major web site does this kind of thing (Amazon, Google, etc).
    I solved this a while ago. If anyone stumbles upon this thread and wants to know: use Lucene. It's a full indexing engine and it works really well.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •