If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > diacritic letters (czech...) and LIKE

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-17-03, 10:06
iubito iubito is offline
Registered User
 
Join Date: Nov 2003
Location: France
Posts: 5
Question diacritic letters (czech...) and LIKE

Hi, in my mysql database (4.0.15) I've a field that contain some specials (Czech, Polish...) letters. In my example, I use the word Dvořák (the famous czech classical musician), which name have a diacritic letter for R and an "acute" on the A.

I want to perform search on my database. If I look for "Dvorak" without any special letter, mysql do not return the wanted record.
My query is
PHP Code:
$query "select * from table where lower(word) LIKE '%$search%'..."
The R is to special I suppose for MySQL. If I try with another simpler word (accordéon, zampoña...) MySQL recognize the é and the ñ and if I search àcçordeón (lol) it returns my 'Accordéon' record

my $search var was strlower()ed before query.


Here are the different tries...


I tried the SOUNDEX() that seems to work
select soundex("Dvorak") => D162
select soundex("Dvořák") => D162
but I want the query return all records containing "Dvorak" or "Dvořák" in the middle of a sentence !
My query must return the following record for example : "(...) This music was composed by Dvořák (...)"


I can try one ugly method, add a field with texts without any accent and search in this field... but
  • my database is enormous
  • I didn't find a php or mysql function that transform all unknown diacritic letters (with acute, grave, umlaut, tilde...) into ascii letters ř -> r, č -> c... BIG PROBLEM
    I don't know all this foreign letters because I don't speak Romanian, Polish, Czech, Slovakian, Hungarien, Swedish...
  • If I knew all these letter, they can't be entered in a text editor


I tried with BINARY, do not work, and result was worse :'(


REGEXP :
PHP Code:
$query "Select * from table where word REGEXP $search"
do not work better


Now maybe one solution, COLLATE.
I read carefully this page : http://www.mysql.com/doc/en/Charset.html
but I always get error like this one :
PHP Code:
$query "SELECT * FROM table 
WHERE lower(word) like '%dvorak%' 
or lower(comment) like '%dvorak%' 
or _latin1 description COLLATE latin1_general_ci like '%dvorak%' 
ORDER BY word "

Quote:
#1064 - Syntax error near 'description COLLATE latin1_general_ci LIKE '%dvorak%'
ORDER BY '
If you have an idea, please help me !

Thanks in advance

P.S.: sorry if my english is not quite good :P
Reply With Quote
  #2 (permalink)  
Old 11-18-03, 08:29
ika ika is offline
Registered User
 
Join Date: Oct 2003
Location: Slovakia
Posts: 482
Re: diacritic letters (czech...) and LIKE

Quote:
Originally posted by iubito
Hi, in my mysql database (4.0.15) I've a field that contain some specials (Czech, Polish...) letters. In my example, I use the word Dvořák (the famous czech classical musician), which name have a diacritic letter for R and an "acute" on the A.

I want to perform search on my database. If I look for "Dvorak" without any special letter, mysql do not return the wanted record.
My query is
PHP Code:
$query "select * from table where lower(word) LIKE '%$search%'..."
The R is to special I suppose for MySQL. If I try with another simpler word (accordéon, zampoña...) MySQL recognize the é and the ñ and if I search àcçordeón (lol) it returns my 'Accordéon' record

my $search var was strlower()ed before query.


Here are the different tries...


I tried the SOUNDEX() that seems to work
select soundex("Dvorak") => D162
select soundex("Dvořák") => D162
but I want the query return all records containing "Dvorak" or "Dvořák" in the middle of a sentence !
My query must return the following record for example : "(...) This music was composed by Dvořák (...)"


I can try one ugly method, add a field with texts without any accent and search in this field... but
  • my database is enormous
  • I didn't find a php or mysql function that transform all unknown diacritic letters (with acute, grave, umlaut, tilde...) into ascii letters ř -> r, č -> c... BIG PROBLEM
    I don't know all this foreign letters because I don't speak Romanian, Polish, Czech, Slovakian, Hungarien, Swedish...
  • If I knew all these letter, they can't be entered in a text editor


I tried with BINARY, do not work, and result was worse :'(


REGEXP :
PHP Code:
$query "Select * from table where word REGEXP $search"
do not work better


Now maybe one solution, COLLATE.
I read carefully this page : http://www.mysql.com/doc/en/Charset.html
but I always get error like this one :
PHP Code:
$query "SELECT * FROM table 
WHERE lower(word) like '%dvorak%' 
or lower(comment) like '%dvorak%' 
or _latin1 description COLLATE latin1_general_ci like '%dvorak%' 
ORDER BY word "


If you have an idea, please help me !

Thanks in advance

P.S.: sorry if my english is not quite good :P
1. SOUNDEX() is not usefull for non english languages especially slavic languages.

2. There is two functions in PHP for code base conversion, recode() and iconv()

3. You must setup codebase when before create of database

4. Czech language have following code bases WIN-1250 and ISO-8859-2 (cp-852 is not supported by MySQL) but i think will be better when you setup the UTF-8 support (of course when you are using 4.1.x version)
Reply With Quote
  #3 (permalink)  
Old 11-18-03, 09:10
iubito iubito is offline
Registered User
 
Join Date: Nov 2003
Location: France
Posts: 5
Thanks ika for your reply

The problem is that on my computer I can install MySQL 4.1.x but I'm using this database on an hosted website I can't change MySQL.
I'll have a look on this codebase for my personal computer, but if I can find one good solution...
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On