try looking for 'trigram' with google.
the basic idea is to make an index table that holds e.g. for 'pepsi' entries for 'pep', 'eps', and 'psi' each with FK to the 'pepsi' record.
then break the search-for string into 'trigrams' and search in your index table for each of them. the more hits (for equal length indexed words!!!) the more 'relevant' the match.
it is not perfect!
i have some old sort-of-trigram test code using dummy data (data is 974 first-names that were most popular in the US in 2004). i added 'pepsi' & 'coke' & 'sprite' as names and tried a few tests.
'a big can of pepsi' - pepsi is the only hit
'can of pepsy' - pepsi is the only hit (similarly with 'depsi' 'pypsi' ...because 'pep', 'eps', 'psi' are rare(!) in US names)
'i prefer coke' - coke is top followed by 7 others
'a big can of pepsi is my favorite' - pepsi is top, followed by sprite, trevor, gustavo and 11 others
'sprite is so nice' - sprite is top followed by 18 others
'i love koke' - equal weighting for coke & brooke
so far so good
'peksi' - no hits
'a big can of pepsi is my destiny' - destiny is the top hit, then pepsi, then 26 others
'marshmallow and pepsi' - marshall, mallory, then pepsi, then 73 others
not surprising: nothing matches with the 'k' typo (it kills all three valid trigrams for pepsi). for the other two, there are more matching trigrams in the wrong words so the weightings get screwed.
Last edited by izyrider; 04-16-07 at 13:20.
Reason: ooooops: stupid typo
currently using SS 2008R2