Results 1 to 5 of 5
  1. #1
    Join Date
    Nov 2004
    Posts
    2

    Unanswered: FULLTEXT search problem

    Hi all,

    I use MySQL 3.23.58 on Fedora Core 2. I have two varchar fields in a table, which I have indexed properly and observe the following problems with FULLTEXT:

    If the query finds one result it works ok. If the query finds more than one it returns none. For example I have 3 entries named 'hello', 'world' and 'world' respectively. FULLTEXT search for 'hello' returns entry 1, search for 'world' returns nothing!

    In addition, string matching works only for the exact string, ie search for 'hello' works, search for 'hell' does not work.

    Any help would be greatly appreciated.

    Vass

  2. #2
    Join Date
    Apr 2004
    Location
    Europe->Sweden->Stockholm
    Posts
    71
    In your table, 2/3 of all rows contains "world", and therefore MySQL treats it like a stopword. It does so if more than 50% of the rows matches, and instead of (possibly) a lot of results gives you no result at all. You can make a fulltext search IN BOOLEAN MODE to work around this, but often you don't want a result set that contains more than 50% of the rows anyway.

    MySQL Reference Manual, section 13.6:

    For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results. For example, although the word “MySQL” is present in every row of the articles table, a search for the word produces no results:
    Code:
    mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST (’MySQL’); 
    Empty set (0.00 sec)
    The search result is empty because the word “MySQL” is present in at least 50% of the rows. As such, it is effectively treated as a stopword. For large datasets, this is the most desirable behavior—a natural language query should not return every second row from a 1GB table. For small datasets, it may be less desirable. A word that matches half of rows in a table is less likely to locate relevant documents. In fact, it will most likely find plenty of irrelevant documents. We all know this happens far too often when we are trying to find something on the Internet with a search engine. It is with this reasoning that rows containing the word are assigned a low semantic value for the particular dataset in which they occur. A given word may exceed the 50% threshold in one dataset but not another. The 50% threshold has a significant implication when you first try full-text searching to see how it works: If you create a table and insert only one or two rows of text into it, every word in the text occurs in at least 50% of the rows. As a result, no search returns any results. Be sure to insert at least three rows, and preferably many more.

  3. #3
    Join Date
    Apr 2004
    Location
    Europe->Sweden->Stockholm
    Posts
    71
    Quote Originally Posted by vgs
    In addition, string matching works only for the exact string, ie search for 'hello' works, search for 'hell' does not work.
    Forgot to say something about this.

    Code:
    SELECT * FROM tab WHERE col='hell%'
    won't return rows where col is "hell", "hello" and "helloo", but where col is "hell%". However,
    Code:
    SELECT * FROM tab WHERE col LIKE 'hell%'
    will return rows where col is "hello", "hello", "helloo" and "hell%".

    I don't know if this was what you meant, but I hope

  4. #4
    Join Date
    Nov 2004
    Posts
    2

    Thumbs up thanks

    snorp,

    I should have read the manual more carefully, my apologies.
    Apparently, the IN BOOLEAN MODE parameter works from version 4.01 and up,
    which I found on another machine and it works like a charm now.

    Thank you very much for your reply, I resolved both issues with your help.

    Regards,

    Vass

  5. #5
    Join Date
    Apr 2004
    Location
    Europe->Sweden->Stockholm
    Posts
    71
    It's nice to be able to help. Np

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •