| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |
|

07-28-07, 06:56
|
|
Registered User
|
|
Join Date: Mar 2007
Posts: 194
|
|
|
Case sensitive Full Text search syntax
|
|
hi,
does anybody know the syntax of doing a case sensitive fulltext search is?
I have the following table:
chemical_id int
formula varchar
name varchar
fulltext(formula, name)
The table contians a million chemical compounds. when searching on formula the search needs to be case sensitive but when searching on name the search needs to be case insensative. Also the formula search and name search will never be done in a single query.
e.g.
select id from Table where match(formula) against('"some formula"');
the above query needs to be case sensitive because uppercase and lowercase of the same formula string can denote different chemicals.
the second query would be as follows;
select id from Table where match(name) against('"some chemical name"');
the above query need not be case sensitive.
anybody know what the correct syntax would be to achieve the above?
|
|

07-31-07, 03:35
|
|
Registered User
|
|
Join Date: Mar 2007
Location: 636f6d7075746572
Posts: 770
|
|
Fulltext searches are case insensitive by default (I believe), however in BOOLEAN MODE they are case sensitive.
If you want to make the name a case sensitive search then use a case sensitive collation. e.g. latin1_swedish_cs
|
|

07-31-07, 03:41
|
|
Registered User
|
|
Join Date: Mar 2007
Location: 636f6d7075746572
Posts: 770
|
|
|
|
After re-reading your post I noticed that you have fulltext search on name AND formula, silly me! Also, I noted that the IN BOOLEAN MODE that should produce a case sensitive search in fact might not. I believe this was reported as a bug to MySQL. Give it a try and see if you get the desired results.
|
Last edited by aschk; 07-31-07 at 03:46.
|

07-31-07, 03:49
|
|
Registered User
|
|
Join Date: Mar 2007
Location: 636f6d7075746572
Posts: 770
|
|
Check out : http://dev.mysql.com/doc/refman/5.0/...xt-search.html
See the section as follows :
By default, the search is performed in case-insensitive fashion. However, you can perform a case-sensitive full-text search by using a binary collation for the indexed columns. For example, a column that uses the latin1 character set of can be assigned a collation of latin1_bin to make it case sensitive for full-text searches.
Thus, you have your answer, and it refers to my original post. You need to set the collation of the column to be case sensitive, but in particular binary sensitive (possibly because this is the only option, and swedish_cs won't work).
Happy searching...!
|
|

07-31-07, 05:30
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
Obviously every application is different but in my experience most users just type in a string and hit return - they rarely bother with putting in upper and lower case text. They would expect to be shown a list of matches and they'd click the one they're after. Chemists however may be totally different to most users
You could have a small sproc that first does the case insensitive search and puts these matches into a temporary store. Next it would look at just these chemicals and do a case sensitive search. If there are case sensitive matches then remove the other items. In this way you'd get the best of both worlds.
Is the full text searching working fast enough for you now? What was the reason for it running slowly before?
Mike
|
|

08-03-07, 12:56
|
|
Registered User
|
|
Join Date: Mar 2007
Posts: 194
|
|
Quote:
|
Originally Posted by mike_bike_kite
Obviously every application is different but in my experience most users just type in a string and hit return - they rarely bother with putting in upper and lower case text. They would expect to be shown a list of matches and they'd click the one they're after. Chemists however may be totally different to most users
You could have a small sproc that first does the case insensitive search and puts these matches into a temporary store. Next it would look at just these chemicals and do a case sensitive search. If there are case sensitive matches then remove the other items. In this way you'd get the best of both worlds.
Is the full text searching working fast enough for you now? What was the reason for it running slowly before?
Mike
|
The full text search works but its producing incorrect results to demonstrate below is an example on the mysql forum which some body posted but which clearly demonstrates the same problem am having with results.
Quote:
I use fulltext searching to allow for fast substring searches on
certain fields in a database with about 3 million records (it's
basically a data warehousing situation). Searches are ad hoc queries,
so using fulltext indexing is my attempt to reduce the number of table
scans. I have full control over the SQL syntax used. Users simply
enter search terms.
Assume a fulltext indexed field called "location" in table "foo", which
contains street addresses. Here are some sample values that illustrate
the issues I'm going to bring up:
1) "21 SYCAMORE ROAD"
2) "22 SYCAMORE RD"
3) "10 SYCAMORE AVE"
4) "12 SYCAMORE AVENUE"
First, consider record #1. The following will show one matching
result, as expected:
select count(*) from foo where match(location) against('"21 sycamore"'
in boolean mode);
These, however, do NOT work:
select count(*) from foo where match(location) against('"21 syca"' in
boolean mode);
select count(*) from foo where match(location) against('"21 syca*"' in
boolean mode);
select count(*) from foo where match(location) against('"21 syca"*' in
boolean mode);
I would expect them to work like this, which DOES work:
select count(*) from foo where location like '21 syca%';
To quell any comments about simply using the LIKE operator, let me
provide another example.
Let's assume I want to find all matching records for Sycamore Avenue.
This would be record #3 and #4. However, one is abbreviated, and one
is not.
The LIKE operator method is this:
select count(*) from foo where location like '%sycamore ave%';
That works, but requires a table scan.
select count(*) from foo where match(location) against('"sycamore ave"'
in boolean mode);
This will match record #3 only.
select count(*) from foo where match(location) against('"sycamore
avenue"' in boolean mode);
This, of course, only matches record #4.
The following matches zero rows:
select count(*) from foo where match(location) against('"sycamore
ave*"' in boolean mode);
And this duplicates the first query of this example:
select count(*) from foo where match(location) against('"sycamore
ave"*' in boolean mode); .
|
So, is there any syntax at all that behaves as I'm expecting?
Basically, I guess I'm asking if there's any way to get the truncation
operator to actually function inside a quoted expression just like the LIKE operator as follows:
where field like '%sycamore ave%'
so if there were "sycamore ave" and "sycamore avenue" strings in a table , both would be found.
|
|

08-03-07, 14:55
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
Quote:
Originally Posted by ozzii
Basically, I guess I'm asking if there's any way to get the truncation
operator to actually function inside a quoted expression just like the LIKE operator as follows:
where field like '%sycamore ave%'
so if there were "sycamore ave" and "sycamore avenue" strings in a table , both would be found.
|
Why not just have a table that contains alternative words and a query that will return all the possible alternatives of a given sentence. You'd then do your searches using the alternatives given to you ie
Code:
create table tmp_translate(
word_from varchar(20),
word_to varchar(20)
);
insert tmp_translate values ( 'rd','road' );
insert tmp_translate values ( 'ave','avenue' );
select distinct replace( '21 sycamore avenue', word_from, word_to )
from tmp_translate
union
select distinct replace( '21 sycamore avenue', word_to, word_from )
from tmp_translate;
+-----------------------------------------------------+
| replace( '21 sycamore avenue', word_from, word_to ) |
+-----------------------------------------------------+
| 21 sycamore avenue |
| 21 sycamore ave |
+-----------------------------------------------------+
2 rows in set (0.00 sec)
There are a number of problems with this code but I wrote this very quickly just to demonstrate the idea. You'd pass in your initial search string as a parameter and get back a bunch of new strings to try searching for.
Mike
|
|

08-03-07, 15:55
|
|
Registered User
|
|
Join Date: Mar 2007
Posts: 194
|
|
Quote:
|
Originally Posted by mike_bike_kite
Why not just have a table that contains alternative words and a query that will return all the possible alternatives of a given sentence. You'd then do your searches using the alternatives given to you
|
Standardizing the data isn't an option. I only used street addresses as an example. I am dealing with chemical names and formulas aswell as other fields where there is no standardized format.
This begs the question. Why is there no syntax to perform the above in a single query i.e using a wild card prefix and suffix simultaneously on a search string. I mean oracle supports it so it can be done - just catn understand why the guys at mysql have ommited such a useful feature?
|
|

08-03-07, 16:54
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
Ozzii
Most people on the forum will aim to provide you with an answer to whatever question you ask. My previous post supplied an answer to the last question you asked. If this wasn't your problem then perhaps you should phrase your questions a little better.
If you're wanting a wildcard character like % then this wouldn't help because searching for "sycamore rd%" will never match "sycamore road". This means you will have to code it.
If Oracle provides whatever functionality you need then why not simply use Oracle?
Mike
PS Why don't you look at whatever software they currently use (and actually like) and see how it gets round your problem. If this appears difficult to implement then why not simply hire an expert and get them to implement it for you. A number of us replied to your other post covering similar issues but you haven't replied to say whether this issue is now fixed.
Edited: with small addition
|
Last edited by mike_bike_kite; 08-04-07 at 04:38.
|

08-04-07, 05:58
|
|
www.gvee.co.uk
|
|
Join Date: Jan 2007
Location: UK
Posts: 10,156
|
|
What would happen if you always ran the query in case sensitive but for your "name" field searches could simply be converted to upper case (or lower, if that's the way you sway  ) thus, negating the sensitivity.
Example:
Code:
select id from Table where match(formula) against('"some formula"');
select id from Table where match(Upper(name)) against(Upper('"some chemical name"'));
|
|

08-04-07, 08:55
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
Changing the case shouldn't make a difference I'm afraid. By default, the search is performed in case-insensitive fashion. However, you can perform a case-sensitive full-text search by using a binary collation for the indexed columns.
Doing all that is fine but I thought he'd prefer a solution that would get the best of both worlds - hence my previous idea.
Did you enjoy your holiday?
Mike
|
|

08-04-07, 15:01
|
|
www.gvee.co.uk
|
|
Join Date: Jan 2007
Location: UK
Posts: 10,156
|
|
Quote:
|
Originally Posted by mike_bike_kite
By default, the search is performed in case-insensitive fashion
|
Correct, aschk mentioned that in this post.
My solution utilizes the case sensitive search by default but negates it for the name column through the use of Upper().
Nice as your idea sounds about the linked words table, it would be a pain to implement (just think about how many permutations there are for each word that people may want to search!) - I opted for the QAD approach, not the ideal in this case.
Quote:
|
Originally Posted by mike_bike_kite
Did you enjoy your holiday?
|
Yes I did, thank you for asking! We avoided the floods and rain and discovered sunshine in the UK! So much so that I got pretty badly sunburnt... On my feet 
|
|

08-04-07, 15:51
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
Quote:
|
My solution utilizes the case sensitive search by default but negates it for the name column through the use of Upper().
|
The search is case- insensitive so it shouldn't matter whether you pass across "rd", "RD" or upper("rd").
Quote:
|
Nice as your idea sounds about the linked words table, it would be a pain to implement (just think about how many permutations there are for each word that people may want to search!)
|
I don't know chemistry very well but I'm guessing that the common abbreviations would only apply to 10 - 20 words. If his requirement is for "rd" to match "road" then I can't think of many other ways of doing this.
Quote:
|
I opted for the QAD approach, not the ideal in this case.
|
Obviously there are far too many abbreviations in computing though - what is a QAD approach? or an OP (in our other thread) for that matter?
Quote:
|
We avoided the floods and rain and discovered sunshine in the UK
|
I've also just come back from a short motorcycle tour of the south coast (of the UK) - went to the submarine, motor and tank museums as well enjoying the beaches and forests - had a great time.
Mike
|
|

08-06-07, 03:13
|
|
www.gvee.co.uk
|
|
Join Date: Jan 2007
Location: UK
Posts: 10,156
|
|
Quote:
|
Originally Posted by mike_bike_kite
The search is case-insensitive so it shouldn't matter whether you pass across "rd", "RD" or upper("rd").
|
I'm suggesting we use aschk's suggestion and urn case sensitivity on then
"rd" <> "RD" <<sensitive
but
Upper("rd") = Upper("RD") <<insensitive
---
OP = Original Poster
QAD = Quick And Dirty
---
There's a submarine museum in the UK?
|
|

08-06-07, 14:45
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
Quote:
Originally Posted by georgev
I'm suggesting we use aschk's suggestion and urn case sensitivity on then
|
The average person just uses a single case (usually lower) when entering a string into a search box. I'm not sure about chemists though. Personally - if I entered something like "harry potter" into Amazon and it came back with nothing then I'd just go to another book store rather than fanny around with different cases. I guess your three options are: - Force case checking - I guess people like me would just have to enter a string and find no matches before re-entered the string and paying more attention to the case of each character. This would all be done while cursing the designer of the system under my breath. Of course there would be those users who try to be precise with the case but perhaps get the case of one letter wrong and therefore never find what they are after.
- Ignore the case - I guess people like me would be content as we'd always find what we're looking for while those who enter the correct case for each letter would be met with more matches than they expected.
- Mixing the two methods - by ignoring the case on the first pass to get all matches and then only reducing the subset to exact case matches if they exist - will find everything for users like me. The users who try to be precise with the case will get the exact matches they were expecting also.
I guess it's a matter of taste as to which option you go for but you get the best of both worlds with the last option for no real performance loss.
Quote:
Originally Posted by georgev
There's a submarine museum in the UK?
|
The submarine museum near Portsmouth is worth going to if you're into subs (I am) and they have a full scale 1947 sub you can get a tour of. The tank museum was even better with just about everything you could imagine in there.
Mike
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|