If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Full text search with html entities

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-28-06, 12:21
ldelgado ldelgado is offline
Registered User
 
Join Date: Jul 2005
Posts: 3
Full text search with html entities

Hello,

I have a table with some fields indexed using fulltext but I'm having some issues on matching keywords that include HTML entities. MySQL is matching rows that are doesn't have my keyword but include the html entity:

HTML Code:
SELECT *, MATCH(cv.title,cv.intro,cv.text) AGAINST ('colaço') AS score FROM cms_contents ct LEFT JOIN cms_content_versions cv ON cv.id_content=ct.id_content WHERE NOW() BETWEEN ct.date_start AND ct.date_expire AND MATCH(cv.title,cv.intro,cv.text) AGAINST ('colaço') AND ct.id_lang = 1 AND ct.active='1' GROUP BY ct.id_content
On this query, I'm trying to find matches for the keyword "colaço" which includes an HTML entity.
When I run this query on the DB, not only it returns results with this keyword but also other keywords that include the entity 'ç'.
I don't have a clue why is this happening.

Does mySQL Fulltext index have issues with html entities?

I'm looking for a solution for this.
Thanks.
Reply With Quote
  #2 (permalink)  
Old 03-30-06, 11:33
ldelgado ldelgado is offline
Registered User
 
Join Date: Jul 2005
Posts: 3
Can anyone help me out?
Thanks
Reply With Quote
  #3 (permalink)  
Old 03-31-06, 18:53
jfulton jfulton is offline
Registered User
 
Join Date: Apr 2005
Location: Baltimore, MD
Posts: 297
I feel like somebody posted something like this a few months back. It was also an issue with the '&' character. It may be treated as a stopword (I'm pretty sure the '-' character does this). There are some issues when using full text searches. Try checking the mysql site and googling some more and if I come up with a bright idea, I'll let you know.
Reply With Quote
  #4 (permalink)  
Old 03-31-06, 19:31
jfulton jfulton is offline
Registered User
 
Join Date: Apr 2005
Location: Baltimore, MD
Posts: 297
After revisiting this, I think this is why your query wasn't working:

The parser used the '&' and the ';' characters as delimiters, so you were actually searching against 'cola', 'ccedil', and 'o'. Since 'o' is too short, it was dropped. Now you are just searching for 'cola' and 'ccedil' (not as one word anymore). This would explain why entries with 'ç' were returned. They should have had a lower score as well, right? Similarly, you should have also gotten back results with just 'cola' and not 'ccedil'. I think the trick is to encapsulate the match against in double quotes using boolean mode.

Code:
SELECT 
     *
     , MATCH(cv.title,cv.intro,cv.text) AGAINST ('"colaço"' IN BOOLEAN MODE) AS score 
FROM 
     cms_contents ct 
          LEFT JOIN cms_content_versions cv 
          ON cv.id_content=ct.id_content 
WHERE 
     NOW() BETWEEN ct.date_start AND ct.date_expire 
     AND MATCH(cv.title,cv.intro,cv.text) AGAINST ('"colaço"' IN BOOLEAN MODE) 
     AND ct.id_lang = 1 AND ct.active='1' 
GROUP BY ct.id_content
Let me know if that works. Good luck.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On