Ok but there is just one problem with that. The collation of the formula field is set so that it is case sensitive (latin1_general_cs ) and the collation of the name field isnt. According to the mysql documention the use of multiple character sets within a single table is supported however all columns in a FULLTEXT index must use the same character set and collation.
So how would you get around the above if i want to set up a fulltext index on both the formula and the name fields?
I would but I require only the formula field to be case sensistive becuase uppercase and lower case of the same formula denotes different chemical compounds. I dont need the name field to be case sensistive because it is irrelevant.
Also is it true that mysql will not use and index on a like operator using the % ?
I have seen post #2. But I seek further clarification as to where you got this information from? There is nothing in the MySql documention which mentions using a % with a LIKE operator search negates the index - or at least I cant finding anything to this effect.
would you mind forwarding a link to where you got this information from?
Full text indexes are optimised to work around words and sentences and may have difficulty making any sense out of your formula field containing "CN1C=NC2=1C(=O)N(C)C(=O)N2C". You may find it quicker to search this field using a normal index. If the chemical name field is made up of discrete words then this might be a good candidate for the full text search (assuming you want to search on this).
However as guelphdad pointed out, searching for "%string%" won't work well with normal indexes either. Indexes are usually organised with B-trees that allow the database to quickly navigate to the right key item - so if you searched for "CN1C%" the database could jump to those key items beginning with C then jump to those starting with CN and quickly get to your answer. If your search string starts with % then the database wouldn't know where to jump to and would instead have to scan each item in the index (or just do a table scan) and this takes far more time. I found this link which covers some of this.
It really depends on how many chemicals you have in your database - if there aren't that many then you might find it doesn't really matter. Could you try both types of index (and perhaps no index at all) and see which is faster. It would be interesting to see the comparisons.
If things are still slow then could you create your own caching table where you might want to look for common substrings in the formula field and then extract all the chemical_ids that contain it. Then if a user searches for one of these substrings you can just get the chemical_ids directly from your cache table without any searching. Another method might be to simply cache all the user searches and the ids returned then use this cache if the search string has been used before - you may want to repopulate the matches automatically if there is a lot of change in your database.
ozzii, even if you can't find any information on what guelphdad was talking about in regards to indexes being searched, just think about it logically. From a searching algorithm point of view, in a tree structure.
If you say LIKE 'ABC%' and LIKE '%ABC%', which is a) going to be found faster & b) indexed.
Lets take ABC% as our first example: in alphanum sorting, the character 'A' is going to be near the top of our resultset (and thus probably the left hand-side our tree), thus the index can go :
IS root_node > A (yes)
IS subnode of root_node > A (yes)
Keep going left...
and so on...
In the second example our first character is in fact %, which is ANYTHING under the sun. So when we start searching we are doing:
IS root_node > UNKNOWN (it could be)
GO LEFT AND RIGHT
Is the second character of our string part an A
It could be, but it might not be.
If is is, then check for B and so on...
If it isn't , then carry onto the third character in that string, is it an A?
As you can see a LIKE ON '%ABC%' is going to be horribly inefficient, and have to search practically the whole tree to find what it's after. Potentially EVERY node could contain the answer we're looking for.
Please note: this is a simplistic example of what i'm trying to demonstrate and does not in fact represent what happens in the MySQL algorithms that perform the search.