Results 1 to 5 of 5
  1. #1
    Join Date
    Aug 2010
    Posts
    3

    Unanswered: question about speed

    please help me out,

    i have a database with a table.
    id, foreignid, varchar, varchar

    it contains contact info
    so first varchar is "firstname" or "lastname" the sedcond one is the value, so like "Johnny" or "Depp" or other info such as birthdate.

    The table contains 100.000 entries.

    My client prefers fuzzy search, but the speed is not fast enough.
    I have 2 queries, 1 with fuzzy search (soundex and levenshtien) and one with fulltext and like. See queries below.

    Query 1 (advanced one) takes 0.2964 avg with result and 0.2422 without result.
    Query 2 takes around 0.108 avg with or without result.

    My question is, is this fast already or could this be much faster.
    The client wants to check in batches of 10.000 names, so you understand my problem, if I dont want to timeout the script. (ps it's result is webbased)

    oh, and an index has been made to that column.
    either way, both will take too long if it was 10.000 times.

    Query 1:
    (SELECT * FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='' AND soundex(prop_value)=soundex('Depp'))
    UNION
    (SELECT * FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='' AND MATCH(prop_value) AGAINST ('Depp'))
    UNION
    (SELECT s.* FROM (SELECT levenshtein(prop_value, 'Depp') AS dist, sanction_id, prop_type, prop_value FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='') s WHERE dist < 2)

    UNION
    (SELECT * FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='' AND soundex(prop_value)=soundex('Johnny Depp'))
    UNION
    (SELECT * FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='' AND MATCH(prop_value) AGAINST ('Johnny Depp'))
    UNION
    (SELECT s.* FROM (SELECT levenshtein(prop_value, 'Johnny Depp') AS dist, sanction_id, prop_type, prop_value FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='') s WHERE dist < 6)

    UNION
    (SELECT * FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='' AND soundex(prop_value)=soundex('Depp Johnny'))
    UNION
    (SELECT * FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='' AND MATCH(prop_value) AGAINST ('Depp Johnny'))
    UNION
    (SELECT s.* FROM (SELECT levenshtein(prop_value, 'Depp Johnny') AS dist, sanction_id, prop_type, prop_value FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='') s WHERE dist < 6)

    ORDER BY s_id






    Query 2:

    (SELECT * FROM ci_sanctions_properties WHERE prop_type='WHOLENAME' AND prop_value LIKE '%johnny%%depp%' OR prop_type='WHOLENAME' AND MATCH(prop_value) AGAINST ('+johnny +depp' IN BOOLEAN MODE))
    UNION
    (SELECT * FROM ci_sanctions_properties WHERE prop_type='WHOLENAME' AND prop_value LIKE '%depp%%johnny%' OR prop_type='WHOLENAME' AND MATCH(prop_value) AGAINST ('+depp +johnny' IN BOOLEAN MODE))
    ORDER BY sanction_id

  2. #2
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    with all those UNION's around you are surprised you have time out or speed problems?

    the only way of finding out if its fast enough to to ask the user what is acceptable, then run your benchmarks, then do some testing based on volumes. given modern networks I doubt 0.2 0.3 seconds is an issue bearing in mind the latency introduced by network traffic. however Im pretty certain that is the db grows substantially you will have a problem with response times, and you will have an impact on any other users accessing that server

    personally I think the route cause of your problem, assuming you have a problem, is the EAV model you are using. had you normalised the db and had the names in the same row you would have dramatically reduced the load on the server and I suspect made it faster to return relevant data to the user.
    I'd rather be riding on the Tiger 800 or the Norton

  3. #3
    Join Date
    Aug 2010
    Posts
    3

    so what would your solution be?

    Thank you very much for your reply, fast as well.

    I am just a web programmer and normally do'n work as much with these kind of database queries, more in general selecting and inserting and of a much lower volume.

    Could you point me in the right direction in as how you would attack such a thing?

    The reason for my union statements is, that we have a generated wholename and we would like to match on first - last and last-first.

    Possibly also with some fuzzy (soundex levenshtein)

    the table containing the data is like this:

    CREATE TABLE IF NOT EXISTS `ci_properties` (
    `prop_id` int(11) NOT NULL AUTO_INCREMENT,
    `p_id` int(11) NOT NULL,
    `prop_type` varchar(64) NOT NULL,
    `prop_value` varchar(255) NOT NULL,
    PRIMARY KEY (`prop_id`),
    FULLTEXT KEY `prop_value` (`prop_value`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=107683 ;

    I have no idea on how to normalize it than this.
    match a name in this table-row and preferably fuzzy.

    I think the problem is not in the queries but just in the fact that 10.000 times is just not a good way to go on a webserver, so I need to make something based on batch processing or something.

  4. #4
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    I think the problem is your data model is using EAV, not a normalised db
    to most people in the relational db world the EAV model is pants.

    your "query 2" is starting back into the relational world, but is still based on EAV

    just looking at Query2 I think you could refine it further
    from
    (SELECT * FROM ci_sanctions_properties WHERE prop_type='WHOLENAME' AND prop_value LIKE '%johnny%%depp%' OR prop_type='WHOLENAME' AND MATCH(prop_value) AGAINST ('+johnny +depp' IN BOOLEAN MODE))
    UNION
    (SELECT * FROM ci_sanctions_properties WHERE prop_type='WHOLENAME' AND prop_value LIKE '%depp%%johnny%' OR prop_type='WHOLENAME' AND MATCH(prop_value) AGAINST ('+depp +johnny' IN BOOLEAN MODE))
    ORDER BY sanction_id
    to

    SELECT * FROM ci_sanctions_properties
    WHERE
    prop_type='WHOLENAME' AND (((prop_value LIKE '%johnny%%depp%' OR MATCH(prop_value) AGAINST ('+johnny +depp' IN BOOLEAN MODE))
    or
    (prop_value LIKE '%depp%%johnny%' AND MATCH(prop_value) AGAINST ('+depp +johnny' IN BOOLEAN MODE)))
    ORDER BY sanction_id
    however if your DB was normalised (ie all the attributes of the person are stored in one row then the SQL becomes easier.

    welcome to the evil world of the EAV model. in SQL Server Programming
    I'd rather be riding on the Tiger 800 or the Norton

  5. #5
    Join Date
    Aug 2010
    Posts
    3

    thanks again

    thanks again for your answer,

    I tried what you said and then even simplified it to:

    SELECT * FROM ci_properties WHERE prop_type = 'WHOLENAME' AND (prop_value LIKE '%johnny%depp%' OR MATCH (prop_value) AGAINST ('+johnny +depp' IN BOOLEAN MODE) OR prop_value LIKE '%depp%johnny%')

    as I realized that the MATCH AGAINST would already match so no need to have it twice. Your change shaved of half the time, but I am still not fast enough.

    it took 0.0676 to do this result, only 10.000 times would get to around 10 minutes.

    btw ALL information is in the same row.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •