Results 1 to 3 of 3
  1. #1
    Join Date
    Nov 2003
    Location
    France
    Posts
    5

    Question Unanswered: diacritic letters (czech...) and LIKE

    Hi, in my mysql database (4.0.15) I've a field that contain some specials (Czech, Polish...) letters. In my example, I use the word Dvořk (the famous czech classical musician), which name have a diacritic letter for R and an "acute" on the A.

    I want to perform search on my database. If I look for "Dvorak" without any special letter, mysql do not return the wanted record.
    My query is
    PHP Code:
    $query "select * from table where lower(word) LIKE '%$search%'..."
    The R is to special I suppose for MySQL. If I try with another simpler word (accordon, zampoa...) MySQL recognize the and the and if I search corden (lol) it returns my 'Accordon' record

    my $search var was strlower()ed before query.


    Here are the different tries...


    I tried the SOUNDEX() that seems to work
    select soundex("Dvorak") => D162
    select soundex("Dvořk") => D162
    but I want the query return all records containing "Dvorak" or "Dvořk" in the middle of a sentence !
    My query must return the following record for example : "(...) This music was composed by Dvořk (...)"


    I can try one ugly method, add a field with texts without any accent and search in this field... but
    • my database is enormous
    • I didn't find a php or mysql function that transform all unknown diacritic letters (with acute, grave, umlaut, tilde...) into ascii letters ř -> r, č -> c... BIG PROBLEM
      I don't know all this foreign letters because I don't speak Romanian, Polish, Czech, Slovakian, Hungarien, Swedish...
    • If I knew all these letter, they can't be entered in a text editor



    I tried with BINARY, do not work, and result was worse :'(


    REGEXP :
    PHP Code:
    $query "Select * from table where word REGEXP $search"
    do not work better


    Now maybe one solution, COLLATE.
    I read carefully this page : http://www.mysql.com/doc/en/Charset.html
    but I always get error like this one :
    PHP Code:
    $query "SELECT * FROM table 
    WHERE lower(word) like '%dvorak%' 
    or lower(comment) like '%dvorak%' 
    or _latin1 description COLLATE latin1_general_ci like '%dvorak%' 
    ORDER BY word "

    #1064 - Syntax error near 'description COLLATE latin1_general_ci LIKE '%dvorak%'
    ORDER BY '
    If you have an idea, please help me !

    Thanks in advance

    P.S.: sorry if my english is not quite good :P

  2. #2
    Join Date
    Oct 2003
    Location
    Slovakia
    Posts
    482

    Re: diacritic letters (czech...) and LIKE

    Originally posted by iubito
    Hi, in my mysql database (4.0.15) I've a field that contain some specials (Czech, Polish...) letters. In my example, I use the word Dvořk (the famous czech classical musician), which name have a diacritic letter for R and an "acute" on the A.

    I want to perform search on my database. If I look for "Dvorak" without any special letter, mysql do not return the wanted record.
    My query is
    PHP Code:
    $query "select * from table where lower(word) LIKE '%$search%'..."
    The R is to special I suppose for MySQL. If I try with another simpler word (accordon, zampoa...) MySQL recognize the and the and if I search corden (lol) it returns my 'Accordon' record

    my $search var was strlower()ed before query.


    Here are the different tries...


    I tried the SOUNDEX() that seems to work
    select soundex("Dvorak") => D162
    select soundex("Dvořk") => D162
    but I want the query return all records containing "Dvorak" or "Dvořk" in the middle of a sentence !
    My query must return the following record for example : "(...) This music was composed by Dvořk (...)"


    I can try one ugly method, add a field with texts without any accent and search in this field... but
    • my database is enormous
    • I didn't find a php or mysql function that transform all unknown diacritic letters (with acute, grave, umlaut, tilde...) into ascii letters ř -> r, č -> c... BIG PROBLEM
      I don't know all this foreign letters because I don't speak Romanian, Polish, Czech, Slovakian, Hungarien, Swedish...
    • If I knew all these letter, they can't be entered in a text editor



    I tried with BINARY, do not work, and result was worse :'(


    REGEXP :
    PHP Code:
    $query "Select * from table where word REGEXP $search"
    do not work better


    Now maybe one solution, COLLATE.
    I read carefully this page : http://www.mysql.com/doc/en/Charset.html
    but I always get error like this one :
    PHP Code:
    $query "SELECT * FROM table 
    WHERE lower(word) like '%dvorak%' 
    or lower(comment) like '%dvorak%' 
    or _latin1 description COLLATE latin1_general_ci like '%dvorak%' 
    ORDER BY word "


    If you have an idea, please help me !

    Thanks in advance

    P.S.: sorry if my english is not quite good :P
    1. SOUNDEX() is not usefull for non english languages especially slavic languages.

    2. There is two functions in PHP for code base conversion, recode() and iconv()

    3. You must setup codebase when before create of database

    4. Czech language have following code bases WIN-1250 and ISO-8859-2 (cp-852 is not supported by MySQL) but i think will be better when you setup the UTF-8 support (of course when you are using 4.1.x version)

  3. #3
    Join Date
    Nov 2003
    Location
    France
    Posts
    5
    Thanks ika for your reply

    The problem is that on my computer I can install MySQL 4.1.x but I'm using this database on an hosted website I can't change MySQL.
    I'll have a look on this codebase for my personal computer, but if I can find one good solution...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •