Results 1 to 4 of 4
  1. #1
    Join Date
    Feb 2010
    Posts
    4

    Unanswered: MySQL MATCH returns no relevance score

    Hi all,

    I'm stuck on a weird bug in some search code for my site. This is a simplified version that I've boiled the problem down to:

    This statement correctly returns an ID and a score, as you would expect:

    Code:
    SELECT id, match (title) against ('search term') as score FROM ArticleDetails ORDER BY score DESC
    But this formally identical statement just returns zeros for all scores:

    Code:
    SELECT id, match (xml) against ('search term') as score FROM ArticlesXML ORDER BY score DESC
    It thus seems to me that the problem must be outside of my queries, but where is not obvious to me. The column for the first example is varchar(255) , and the second is longtext. Could the answer be somewhere here or in the indexing?

    I've read through the relevant documentation and searched around, but no answers as yet. Any help would be much appreciated.
    Last edited by JohannShunt; 02-26-10 at 11:50.

  2. #2
    Join Date
    Feb 2010
    Posts
    4

    Fulltext search problem with XML content stored in longtext fields?

    After a lot of fiddling - testing one variable at a time in the structure of these tables, the character of their data etc - I think I've found the answer to this problem. There doesn't seem to be anything else out there on this, which makes me suspicious of the solution, but in case this is of use to anyone else:

    The fulltext search seems to trip up on XML content that is stored in the longtext column, and always return zeros for match results. To remedy this, I created another column, and inserted into this a version of the content with all of the XML tags stripped out. I indexed this column as fulltext. Now, a search performed on this table, but matching against the stripped version rather than the XML returns sensible (non-zero) results.

    If anyone out there can enlighten me on exactly what is going on here, it'd be appreciated.

  3. #3
    Join Date
    Feb 2010
    Posts
    4
    Hmm. The bug has returned without any obvious cause, and using my previous "solution". This is really weird. I've spent days of work on this now.

  4. #4
    Join Date
    Feb 2010
    Posts
    4
    Well in case it is of use to anyone else, the end of the story is that I discovered the problem was down to MySQL's policy of considering any word that appears in more than 50% of rows to be a stopword.

    To specify: the apparent unpredictability in tests was down to the fact that it was only search phrases in which every individual word was in more than 50% of rows that had the problem: only these returned a relevance of zero, since all words were treated as de facto "stopwords". Exacerbating the confusion, the query that I was running returned all rows rather than none every time it encountered one of these failing searches, meaning that I failed to see that all words were being treated as stopwords. Other very generic searches were fine as long as there was at least one "narrowing" term in less than 50% of rows. Although I was aware of the 50% policy, since I was thinking (incorrectly) on the level of phrases rather than individual words, I assumed such searches as "third world" could not possibly be in more than 50% of rows. As it turned out however, "third" and "world" both individually appear in more than 50% of rows.

    This wasted a lot of my time.... Hopefully others can avoid this problem.
    Last edited by JohannShunt; 03-04-10 at 14:19.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •