Results 1 to 7 of 7
  1. #1
    Join Date
    Mar 2007
    Posts
    212

    Question Unanswered: Full Text Search HELP

    Hi,

    I have the following tables:

    Code:
    CREATE TABLE `tbl_book_languages` (
      `language_id` int(10) unsigned NOT NULL,
      `book_id` int(10) unsigned NOT NULL,
      PRIMARY KEY  (`language_id`,`book_id`),
      CONSTRAINT `book_languages_book_id_FK` FOREIGN KEY (`book_id`) REFERENCES `tbl_books` (`book_id`) ON DELETE CASCADE,
      CONSTRAINT `book_languages_language_id_FK` FOREIGN KEY (`language_id`) REFERENCES `tbl_languages` (`language_id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    
    CREATE TABLE `tbl_books` (
      `book_id` int(10) unsigned NOT NULL auto_increment,
      `book_add_date` datetime NOT NULL,
      `book_status` enum('Approved','Pending Approval') NOT NULL,
      `book_title` varchar(100) NOT NULL,
      `book_desc` text NOT NULL,
      `auth_id` int(10) unsigned NOT NULL,
      etc...
      PRIMARY KEY  (`book_id`),
      CONSTRAINT `books_auth_id_FK` FOREIGN KEY (`auth_id`) REFERENCES `tbl_authors` (`auth_id`) ON DELETE CASCADE
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    
    CREATE TABLE `tbl_languages` (
      `language_id` int(10) unsigned NOT NULL auto_increment,
      `language_name` varchar(150) NOT NULL,
      `language_type_id` int(10) unsigned NOT NULL,
      PRIMARY KEY  (`language_id`),
      UNIQUE KEY `language_name` (`language_name`),
      CONSTRAINT `language_type_id_FK` FOREIGN KEY (`language_type_id`) REFERENCES `tbl_language_types` (`language_type_id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    I wish to use mySQLs full text search so that I can implement a search query such as below where a user can search using a keyword that can be searched on book_title, book_desc and also since a book can be available in different languages a user can also search on book language. The languages a book is available in is implemented using the many to many table called tbl_book_languages:


    Code:
    SELECT books.book_title, books.book_desc, books.book_add_date
      FROM tbl_books as books
    INNER JOIN tbl_book_languages AS bk_lang  
            ON books.book_id = bk_lang.book_id   
         WHERE (books.book_status = 'Approved') AND   
               (DATE(books.book_add_date) = UTC_DATE()) AND 
    	   (bk_lang.language_id IN ("1,2,3")) AND 
         MATCH (books.book_title, books.book_desc) 
       AGAINST ("+beautiful +flower" IN BOOLEAN MODE)
      ORDER BY books.book_add_date DESC
    As you can see the tables are of type innodb. This means a full text index can't be implemented on tbl_books.


    1. I could change the table type to myISAM but then I would have to manage foreign keys programatically and I would also have to change all the other tables to myISAM which reference any foreign keys to tbl_books. This is a lot of messing about and defeats the purpose of a relational database especially to manage referential integrity.

    2. I could use the LIKE clause instead but then I have the performance overhead of doing a full table scan for each query and if it has a million records this will have a huge impact on speed.

    Does anybody have any other solutions which enable the database to manage referential integrity and also utilise a full text search???
    Last edited by ozzii; 02-01-09 at 09:27.

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    maintain your tables as innodb, and set up an extra, reduntant mysql table containing only the FK (back to the original table) and the fulltext columns

    by the way, bk_lang.language_id IN ("1,2,3") will not do what you think it will do

    also, DATE(books.book_add_date) = UTC_DATE() is applying a function to a column, and this means that the optimizer will not be able to use the index (if any) on that column

    for datetime values, to select all rows for a specific date, use a range test with an ope-ended upper bound...
    Code:
     WHERE books.book_add_date >= UTC_DATE()
       AND books.book_add_date  < UTC_DATE() + INTERVAL 1 DAY
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Mar 2007
    Posts
    212
    OK i've set up a redundant table as per your advice:

    Code:
    CREATE TABLE `tbl_book_search` (
      `book_id` int(10) unsigned NOT NULL auto_increment,
      `auth_id` int(10) unsigned NOT NULL,
      `book_title` varchar(100) NOT NULL,
      `book_desc` text NOT NULL,
      PRIMARY KEY  (`book_id`),
      FULLTEXT KEY `idx_ft_booksearch_title_desc` (`book_title`,`book_desc`)
    ) ENGINE=MyISAM AUTO_INCREMENT=5
    I have changed the search query by including Distinct as the query was producing multiple records where a book was in multiple languages.

    Code:
    SELECT 
      DISTINCT bk_search.book_title
             , bk_search.book_desc
             , books.book_add_date
          FROM tbl_book_search as bk_search
    INNER JOIN tbl_book_languages AS bk_lang  
            ON bk_lang.book_id = bk_search.book_id 
    INNER JOIN tbl_books AS books
            ON books.book_id = bk_search.book_id  
         WHERE (books.book_status = 'Approved') AND   
               (DATE(books.book_add_date) >= UTC_DATE()) AND 
    	   (bk_lang.language_id IN (1,2,3))  AND 
         MATCH (bk_search.book_title, bk_search.book_desc) 
       AGAINST ("+beautiful +flower" IN BOOLEAN MODE)
      ORDER BY books.book_add_date DESC
    I have also removed the quotes from the IN clause and it seems to work now - Is that what you meant by saying:

    Quote Originally Posted by r937
    by the way, bk_lang.language_id IN ("1,2,3") will not do what you think it will do
    or did you mean something else?

    Do I also need to include other columns I may need to search by into this redundant table or would you simply reference them on the main table using a join as the other columns don't require a full text index e.g. book_add_date or auth_id? Also why do I need to include the FK in this redundant table as I can reference the FK by doing a join on the main table for example the following query would produce the same result set if am not wrong:

    Code:
    SELECT 
      DISTINCT books.book_title
             , books.book_desc
    	 , books.book_add_date
          FROM tbl_books as books
    INNER JOIN tbl_book_search AS bk_search
            ON bk_search.book_id = books.book_id  
    INNER JOIN tbl_book_languages AS bk_lang  
            ON bk_lang.book_id = books.book_id 
         WHERE (books.book_status = 'Approved') AND   
               (DATE(books.book_add_date) >= UTC_DATE()) AND 
    	   (bk_lang.language_id IN (1,2,3)) AND 
         MATCH (bk_search.book_title, bk_search.book_desc) 
       AGAINST ("+beautiful +flower" IN BOOLEAN MODE)
      ORDER BY books.book_add_date DESC
    Also with this solution I would need to update this search table each time a new book was added or deleted. How would this work if transaction control was implemented and a roll back occurred on the main table as the searchable table will not rollback?


    Thanks.
    Last edited by ozzii; 02-01-09 at 16:55.

  4. #4
    Join Date
    Mar 2007
    Posts
    212
    Still waiting for an answer to my last post. I need to determine if it necessary to add the FKs to the redundant table or not. Any help would be appreciated.

    Thanks.

  5. #5
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    yes, use FKs if you want the database to maintain the relationship between the redundant table and the main table

    no, don't bother, in situations where searching a periodic copy (e.g. cron job at 2 a.m.) is sufficient (how many books do you add each day? what's the chance that someone will fulltext-search for the same book on the day it was added?)

    as for returning other columns, i would just return the ids to the main query
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  6. #6
    Join Date
    Mar 2007
    Posts
    212
    Quote Originally Posted by r937
    yes, use FKs if you want the database to maintain the relationship between the redundant table and the main table

    no, don't bother, in situations where searching a periodic copy (e.g. cron job at 2 a.m.) is sufficient (how many books do you add each day? what's the chance that someone will fulltext-search for the same book on the day it was added?)

    as for returning other columns, i would just return the ids to the main query

    If I wanted to know the number of records returned by the above query I could run a count on the above query. However this would have to be done as a separate query. Is there any way to get a result set along with the number of records returned within one sql query? Basically I need to set up a paging script for the records returned. At present I can do this using 2 separate queries - one to retrieve the record count and another to retrieve the actual result set. This would be more easier if could get both i.e the number of records in total and the actual record set in one.

    I was wondering if that can actually be done. Any help would be appreciated.

    Thanks

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    look up the FOUND_ROWS function in da manual

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •