Unanswered: MySQL MATCH returns no relevance score
I'm stuck on a weird bug in some search code for my site. This is a simplified version that I've boiled the problem down to:
This statement correctly returns an ID and a score, as you would expect:
SELECT id, match (title) against ('search term') as score FROM ArticleDetails ORDER BY score DESC
But this formally identical statement just returns zeros for all scores:
SELECT id, match (xml) against ('search term') as score FROM ArticlesXML ORDER BY score DESC
It thus seems to me that the problem must be outside of my queries, but where is not obvious to me. The column for the first example is varchar(255) , and the second is longtext. Could the answer be somewhere here or in the indexing?
I've read through the relevant documentation and searched around, but no answers as yet. Any help would be much appreciated.
Fulltext search problem with XML content stored in longtext fields?
After a lot of fiddling - testing one variable at a time in the structure of these tables, the character of their data etc - I think I've found the answer to this problem. There doesn't seem to be anything else out there on this, which makes me suspicious of the solution, but in case this is of use to anyone else:
The fulltext search seems to trip up on XML content that is stored in the longtext column, and always return zeros for match results. To remedy this, I created another column, and inserted into this a version of the content with all of the XML tags stripped out. I indexed this column as fulltext. Now, a search performed on this table, but matching against the stripped version rather than the XML returns sensible (non-zero) results.
If anyone out there can enlighten me on exactly what is going on here, it'd be appreciated.
To specify: the apparent unpredictability in tests was down to the fact that it was only search phrases in which every individual word was in more than 50% of rows that had the problem: only these returned a relevance of zero, since all words were treated as de facto "stopwords". Exacerbating the confusion, the query that I was running returned all rows rather than none every time it encountered one of these failing searches, meaning that I failed to see that all words were being treated as stopwords. Other very generic searches were fine as long as there was at least one "narrowing" term in less than 50% of rows. Although I was aware of the 50% policy, since I was thinking (incorrectly) on the level of phrases rather than individual words, I assumed such searches as "third world" could not possibly be in more than 50% of rows. As it turned out however, "third" and "world" both individually appear in more than 50% of rows.
This wasted a lot of my time.... Hopefully others can avoid this problem.