Page 1 of 2 12 LastLast
Results 1 to 15 of 17
  1. #1
    Join Date
    Mar 2007
    Posts
    212

    Unanswered: Case sensitive Full Text search syntax

    hi,

    does anybody know the syntax of doing a case sensitive fulltext search is?

    I have the following table:

    chemical_id int
    formula varchar
    name varchar
    fulltext(formula, name)

    The table contians a million chemical compounds. when searching on formula the search needs to be case sensitive but when searching on name the search needs to be case insensative. Also the formula search and name search will never be done in a single query.

    e.g.

    select id from Table where match(formula) against('"some formula"');

    the above query needs to be case sensitive because uppercase and lowercase of the same formula string can denote different chemicals.

    the second query would be as follows;

    select id from Table where match(name) against('"some chemical name"');

    the above query need not be case sensitive.

    anybody know what the correct syntax would be to achieve the above?

  2. #2
    Join Date
    Mar 2007
    Location
    636f6d7075746572
    Posts
    770
    Fulltext searches are case insensitive by default (I believe), however in BOOLEAN MODE they are case sensitive.
    If you want to make the name a case sensitive search then use a case sensitive collation. e.g. latin1_swedish_cs

  3. #3
    Join Date
    Mar 2007
    Location
    636f6d7075746572
    Posts
    770
    After re-reading your post I noticed that you have fulltext search on name AND formula, silly me! Also, I noted that the IN BOOLEAN MODE that should produce a case sensitive search in fact might not. I believe this was reported as a bug to MySQL. Give it a try and see if you get the desired results.
    Last edited by aschk; 07-31-07 at 04:46.

  4. #4
    Join Date
    Mar 2007
    Location
    636f6d7075746572
    Posts
    770
    Check out :http://dev.mysql.com/doc/refman/5.0/...xt-search.html

    See the section as follows :
    By default, the search is performed in case-insensitive fashion. However, you can perform a case-sensitive full-text search by using a binary collation for the indexed columns. For example, a column that uses the latin1 character set of can be assigned a collation of latin1_bin to make it case sensitive for full-text searches.

    Thus, you have your answer, and it refers to my original post. You need to set the collation of the column to be case sensitive, but in particular binary sensitive (possibly because this is the only option, and swedish_cs won't work).

    Happy searching...!

  5. #5
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Obviously every application is different but in my experience most users just type in a string and hit return - they rarely bother with putting in upper and lower case text. They would expect to be shown a list of matches and they'd click the one they're after. Chemists however may be totally different to most users

    You could have a small sproc that first does the case insensitive search and puts these matches into a temporary store. Next it would look at just these chemicals and do a case sensitive search. If there are case sensitive matches then remove the other items. In this way you'd get the best of both worlds.

    Is the full text searching working fast enough for you now? What was the reason for it running slowly before?

    Mike

  6. #6
    Join Date
    Mar 2007
    Posts
    212
    Quote Originally Posted by mike_bike_kite
    Obviously every application is different but in my experience most users just type in a string and hit return - they rarely bother with putting in upper and lower case text. They would expect to be shown a list of matches and they'd click the one they're after. Chemists however may be totally different to most users

    You could have a small sproc that first does the case insensitive search and puts these matches into a temporary store. Next it would look at just these chemicals and do a case sensitive search. If there are case sensitive matches then remove the other items. In this way you'd get the best of both worlds.

    Is the full text searching working fast enough for you now? What was the reason for it running slowly before?

    Mike
    The full text search works but its producing incorrect results to demonstrate below is an example on the mysql forum which some body posted but which clearly demonstrates the same problem am having with results.

    I use fulltext searching to allow for fast substring searches on
    certain fields in a database with about 3 million records (it's
    basically a data warehousing situation). Searches are ad hoc queries,
    so using fulltext indexing is my attempt to reduce the number of table
    scans. I have full control over the SQL syntax used. Users simply
    enter search terms.

    Assume a fulltext indexed field called "location" in table "foo", which
    contains street addresses. Here are some sample values that illustrate
    the issues I'm going to bring up:

    1) "21 SYCAMORE ROAD"
    2) "22 SYCAMORE RD"
    3) "10 SYCAMORE AVE"
    4) "12 SYCAMORE AVENUE"

    First, consider record #1. The following will show one matching
    result, as expected:

    select count(*) from foo where match(location) against('"21 sycamore"'
    in boolean mode);

    These, however, do NOT work:

    select count(*) from foo where match(location) against('"21 syca"' in
    boolean mode);

    select count(*) from foo where match(location) against('"21 syca*"' in
    boolean mode);

    select count(*) from foo where match(location) against('"21 syca"*' in
    boolean mode);

    I would expect them to work like this, which DOES work:

    select count(*) from foo where location like '21 syca%';

    To quell any comments about simply using the LIKE operator, let me
    provide another example.

    Let's assume I want to find all matching records for Sycamore Avenue.
    This would be record #3 and #4. However, one is abbreviated, and one
    is not.

    The LIKE operator method is this:

    select count(*) from foo where location like '%sycamore ave%';

    That works, but requires a table scan.

    select count(*) from foo where match(location) against('"sycamore ave"'
    in boolean mode);

    This will match record #3 only.

    select count(*) from foo where match(location) against('"sycamore
    avenue"' in boolean mode);

    This, of course, only matches record #4.

    The following matches zero rows:

    select count(*) from foo where match(location) against('"sycamore
    ave*"' in boolean mode);

    And this duplicates the first query of this example:

    select count(*) from foo where match(location) against('"sycamore
    ave"*' in boolean mode); .
    So, is there any syntax at all that behaves as I'm expecting?
    Basically, I guess I'm asking if there's any way to get the truncation
    operator to actually function inside a quoted expression just like the LIKE operator as follows:

    where field like '%sycamore ave%'

    so if there were "sycamore ave" and "sycamore avenue" strings in a table , both would be found.

  7. #7
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Originally Posted by ozzii
    Basically, I guess I'm asking if there's any way to get the truncation
    operator to actually function inside a quoted expression just like the LIKE operator as follows:

    where field like '%sycamore ave%'

    so if there were "sycamore ave" and "sycamore avenue" strings in a table , both would be found.
    Why not just have a table that contains alternative words and a query that will return all the possible alternatives of a given sentence. You'd then do your searches using the alternatives given to you ie

    Code:
    create table tmp_translate(
      word_from   varchar(20),
      word_to     varchar(20)
    );
    
    insert tmp_translate values ( 'rd','road' );
    insert tmp_translate values ( 'ave','avenue' );
    
    
    select distinct replace( '21 sycamore avenue', word_from, word_to )
    from tmp_translate
    union
    select distinct replace( '21 sycamore avenue', word_to, word_from )
    from tmp_translate;
    
    +-----------------------------------------------------+
    | replace( '21 sycamore avenue', word_from, word_to ) |
    +-----------------------------------------------------+
    | 21 sycamore avenue                                  |
    | 21 sycamore ave                                     |
    +-----------------------------------------------------+
    2 rows in set (0.00 sec)
    There are a number of problems with this code but I wrote this very quickly just to demonstrate the idea. You'd pass in your initial search string as a parameter and get back a bunch of new strings to try searching for.

    Mike

  8. #8
    Join Date
    Mar 2007
    Posts
    212
    Quote Originally Posted by mike_bike_kite
    Why not just have a table that contains alternative words and a query that will return all the possible alternatives of a given sentence. You'd then do your searches using the alternatives given to you
    Standardizing the data isn't an option. I only used street addresses as an example. I am dealing with chemical names and formulas aswell as other fields where there is no standardized format.

    This begs the question. Why is there no syntax to perform the above in a single query i.e using a wild card prefix and suffix simultaneously on a search string. I mean oracle supports it so it can be done - just catn understand why the guys at mysql have ommited such a useful feature?

  9. #9
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Ozzii

    Most people on the forum will aim to provide you with an answer to whatever question you ask. My previous post supplied an answer to the last question you asked. If this wasn't your problem then perhaps you should phrase your questions a little better.

    If you're wanting a wildcard character like % then this wouldn't help because searching for "sycamore rd%" will never match "sycamore road". This means you will have to code it.

    If Oracle provides whatever functionality you need then why not simply use Oracle?

    Mike

    PS Why don't you look at whatever software they currently use (and actually like) and see how it gets round your problem. If this appears difficult to implement then why not simply hire an expert and get them to implement it for you. A number of us replied to your other post covering similar issues but you haven't replied to say whether this issue is now fixed.

    Edited: with small addition
    Last edited by mike_bike_kite; 08-04-07 at 05:38.

  10. #10
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    What would happen if you always ran the query in case sensitive but for your "name" field searches could simply be converted to upper case (or lower, if that's the way you sway ) thus, negating the sensitivity.
    Example:
    Code:
    select id from Table where match(formula) against('"some formula"');
    
    select id from Table where match(Upper(name)) against(Upper('"some chemical name"'));
    George
    Home | Blog

  11. #11
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Changing the case shouldn't make a difference I'm afraid. By default, the search is performed in case-insensitive fashion. However, you can perform a case-sensitive full-text search by using a binary collation for the indexed columns.

    Doing all that is fine but I thought he'd prefer a solution that would get the best of both worlds - hence my previous idea.

    Did you enjoy your holiday?

    Mike

  12. #12
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Quote Originally Posted by mike_bike_kite
    By default, the search is performed in case-insensitive fashion
    Correct, aschk mentioned that in this post.

    My solution utilizes the case sensitive search by default but negates it for the name column through the use of Upper().

    Nice as your idea sounds about the linked words table, it would be a pain to implement (just think about how many permutations there are for each word that people may want to search!) - I opted for the QAD approach, not the ideal in this case.
    Quote Originally Posted by mike_bike_kite
    Did you enjoy your holiday?
    Yes I did, thank you for asking! We avoided the floods and rain and discovered sunshine in the UK! So much so that I got pretty badly sunburnt... On my feet
    George
    Home | Blog

  13. #13
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    My solution utilizes the case sensitive search by default but negates it for the name column through the use of Upper().
    The search is case-insensitive so it shouldn't matter whether you pass across "rd", "RD" or upper("rd").

    Nice as your idea sounds about the linked words table, it would be a pain to implement (just think about how many permutations there are for each word that people may want to search!)
    I don't know chemistry very well but I'm guessing that the common abbreviations would only apply to 10 - 20 words. If his requirement is for "rd" to match "road" then I can't think of many other ways of doing this.

    I opted for the QAD approach, not the ideal in this case.
    Obviously there are far too many abbreviations in computing though - what is a QAD approach? or an OP (in our other thread) for that matter?

    We avoided the floods and rain and discovered sunshine in the UK
    I've also just come back from a short motorcycle tour of the south coast (of the UK) - went to the submarine, motor and tank museums as well enjoying the beaches and forests - had a great time.

    Mike

  14. #14
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Quote Originally Posted by mike_bike_kite
    The search is case-insensitive so it shouldn't matter whether you pass across "rd", "RD" or upper("rd").
    I'm suggesting we use aschk's suggestion and urn case sensitivity on then
    "rd" <> "RD" <<sensitive
    but
    Upper("rd") = Upper("RD") <<insensitive
    ---
    OP = Original Poster
    QAD = Quick And Dirty
    ---
    There's a submarine museum in the UK?
    George
    Home | Blog

  15. #15
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Originally Posted by georgev
    I'm suggesting we use aschk's suggestion and urn case sensitivity on then
    The average person just uses a single case (usually lower) when entering a string into a search box. I'm not sure about chemists though. Personally - if I entered something like "harry potter" into Amazon and it came back with nothing then I'd just go to another book store rather than fanny around with different cases. I guess your three options are:
    • Force case checking - I guess people like me would just have to enter a string and find no matches before re-entered the string and paying more attention to the case of each character. This would all be done while cursing the designer of the system under my breath. Of course there would be those users who try to be precise with the case but perhaps get the case of one letter wrong and therefore never find what they are after.
    • Ignore the case - I guess people like me would be content as we'd always find what we're looking for while those who enter the correct case for each letter would be met with more matches than they expected.
    • Mixing the two methods - by ignoring the case on the first pass to get all matches and then only reducing the subset to exact case matches if they exist - will find everything for users like me. The users who try to be precise with the case will get the exact matches they were expecting also.
    I guess it's a matter of taste as to which option you go for but you get the best of both worlds with the last option for no real performance loss.
    Originally Posted by georgev
    There's a submarine museum in the UK?
    The submarine museum near Portsmouth is worth going to if you're into subs (I am) and they have a full scale 1947 sub you can get a tour of. The tank museum was even better with just about everything you could imagine in there.

    Mike

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •