Results 1 to 7 of 7
  1. #1
    Join Date
    Feb 2013
    Posts
    4

    Unanswered: mysql query for non-eglish characters. Very urgent

    Hi all

    i have column " details" in one table. that column stores email contents in HTML format , column data type is blob .my requirement to search and find any email content contains non -english characters ie foreign languages.

    what is the query for this ?pls help urgent

  2. #2
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    sorry
    do you mean find any email that isn't in English
    or
    look for a specific term or phrase in the email which may not be English
    or
    you mean find any email that isn't in an ISO Latin characterset
    or
    look for a specific term or phrase in the email which may not be in ISO Latin character set

    MySQL can store data in anyone of a number of languages (well in reality character sets). most Western European languages are covered by the standard codepage. these languages include English, French, German Italian, Spanish, Swedish and so on. a\s they use (laregly) the same letters all derived from the latin they can sahre the same or simialr code pages.

    short of searching for specific words I cannot see any way of determining if a specific email is in English, French, German, American, Scandinavian or whatever

    hwoevr if you want to search for phrases then it shouldn't matter what the original language is if you are using, say the like clause
    where EMAIL_CONTENTS like '%उपयोगकर्ता नाम%'
    although if you are using MySQL then it may be worthwhile looking up the fullmatch option.
    I'd rather be riding on the Tiger 800 or the Norton

  3. #3
    Join Date
    Feb 2013
    Posts
    4
    looking for any email content that contains non English words

  4. #4
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    so do you want to identify emails that are not in English
    or do you want to look for specific words
    I'd rather be riding on the Tiger 800 or the Norton

  5. #5
    Join Date
    Feb 2013
    Posts
    4
    I want to identify emails that are not in English . one important thing is , all the emails are html content. ie the email contents stored with HTML tags

  6. #6
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    before trying to work out a solution with code..

    how would you go about identifying what language an email is in manually
    what are the things that would help you assess what language is being used

    is their somethign ion the headers or the HTML
    are there certain keywords

    is this something that has to be done manually (ie somebody needs to review the email and identify the language
    I'd rather be riding on the Tiger 800 or the Norton

  7. #7
    Join Date
    Feb 2013
    Posts
    4
    thanks Healdhem for quick response.

    In my case, i have email contents with 51000 records. in 51000 records, i need filter only email with non -English characters. it may be 100 or greater than that . once i filter those records , i will manually identify the languages using google translator

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •