If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > question about speed

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-05-10, 06:05
renevdkooi renevdkooi is offline
Registered User
 
Join Date: Aug 2010
Posts: 3
question about speed

please help me out,

i have a database with a table.
id, foreignid, varchar, varchar

it contains contact info
so first varchar is "firstname" or "lastname" the sedcond one is the value, so like "Johnny" or "Depp" or other info such as birthdate.

The table contains 100.000 entries.

My client prefers fuzzy search, but the speed is not fast enough.
I have 2 queries, 1 with fuzzy search (soundex and levenshtien) and one with fulltext and like. See queries below.

Query 1 (advanced one) takes 0.2964 avg with result and 0.2422 without result.
Query 2 takes around 0.108 avg with or without result.

My question is, is this fast already or could this be much faster.
The client wants to check in batches of 10.000 names, so you understand my problem, if I dont want to timeout the script. (ps it's result is webbased)

oh, and an index has been made to that column.
either way, both will take too long if it was 10.000 times.

Query 1:
(SELECT * FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='' AND soundex(prop_value)=soundex('Depp'))
UNION
(SELECT * FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='' AND MATCH(prop_value) AGAINST ('Depp'))
UNION
(SELECT s.* FROM (SELECT levenshtein(prop_value, 'Depp') AS dist, sanction_id, prop_type, prop_value FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='') s WHERE dist < 2)

UNION
(SELECT * FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='' AND soundex(prop_value)=soundex('Johnny Depp'))
UNION
(SELECT * FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='' AND MATCH(prop_value) AGAINST ('Johnny Depp'))
UNION
(SELECT s.* FROM (SELECT levenshtein(prop_value, 'Johnny Depp') AS dist, sanction_id, prop_type, prop_value FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='') s WHERE dist < 6)

UNION
(SELECT * FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='' AND soundex(prop_value)=soundex('Depp Johnny'))
UNION
(SELECT * FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='' AND MATCH(prop_value) AGAINST ('Depp Johnny'))
UNION
(SELECT s.* FROM (SELECT levenshtein(prop_value, 'Depp Johnny') AS dist, sanction_id, prop_type, prop_value FROM ci_properties WHERE prop_type='WHOLENAME' AND prop_value!='') s WHERE dist < 6)

ORDER BY s_id






Query 2:

(SELECT * FROM ci_sanctions_properties WHERE prop_type='WHOLENAME' AND prop_value LIKE '%johnny%%depp%' OR prop_type='WHOLENAME' AND MATCH(prop_value) AGAINST ('+johnny +depp' IN BOOLEAN MODE))
UNION
(SELECT * FROM ci_sanctions_properties WHERE prop_type='WHOLENAME' AND prop_value LIKE '%depp%%johnny%' OR prop_type='WHOLENAME' AND MATCH(prop_value) AGAINST ('+depp +johnny' IN BOOLEAN MODE))
ORDER BY sanction_id
Reply With Quote
  #2 (permalink)  
Old 08-05-10, 07:24
healdem healdem is online now
Jaded Developer
 
Join Date: Nov 2004
Location: out on a limb
Posts: 9,250
with all those UNION's around you are surprised you have time out or speed problems?

the only way of finding out if its fast enough to to ask the user what is acceptable, then run your benchmarks, then do some testing based on volumes. given modern networks I doubt 0.2 0.3 seconds is an issue bearing in mind the latency introduced by network traffic. however Im pretty certain that is the db grows substantially you will have a problem with response times, and you will have an impact on any other users accessing that server

personally I think the route cause of your problem, assuming you have a problem, is the EAV model you are using. had you normalised the db and had the names in the same row you would have dramatically reduced the load on the server and I suspect made it faster to return relevant data to the user.
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
Reply With Quote
  #3 (permalink)  
Old 08-05-10, 07:36
renevdkooi renevdkooi is offline
Registered User
 
Join Date: Aug 2010
Posts: 3
so what would your solution be?

Thank you very much for your reply, fast as well.

I am just a web programmer and normally do'n work as much with these kind of database queries, more in general selecting and inserting and of a much lower volume.

Could you point me in the right direction in as how you would attack such a thing?

The reason for my union statements is, that we have a generated wholename and we would like to match on first - last and last-first.

Possibly also with some fuzzy (soundex levenshtein)

the table containing the data is like this:

CREATE TABLE IF NOT EXISTS `ci_properties` (
`prop_id` int(11) NOT NULL AUTO_INCREMENT,
`p_id` int(11) NOT NULL,
`prop_type` varchar(64) NOT NULL,
`prop_value` varchar(255) NOT NULL,
PRIMARY KEY (`prop_id`),
FULLTEXT KEY `prop_value` (`prop_value`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=107683 ;

I have no idea on how to normalize it than this.
match a name in this table-row and preferably fuzzy.

I think the problem is not in the queries but just in the fact that 10.000 times is just not a good way to go on a webserver, so I need to make something based on batch processing or something.
Reply With Quote
  #4 (permalink)  
Old 08-05-10, 10:21
healdem healdem is online now
Jaded Developer
 
Join Date: Nov 2004
Location: out on a limb
Posts: 9,250
I think the problem is your data model is using EAV, not a normalised db
to most people in the relational db world the EAV model is pants.

your "query 2" is starting back into the relational world, but is still based on EAV

just looking at Query2 I think you could refine it further
from
Quote:
(SELECT * FROM ci_sanctions_properties WHERE prop_type='WHOLENAME' AND prop_value LIKE '%johnny%%depp%' OR prop_type='WHOLENAME' AND MATCH(prop_value) AGAINST ('+johnny +depp' IN BOOLEAN MODE))
UNION
(SELECT * FROM ci_sanctions_properties WHERE prop_type='WHOLENAME' AND prop_value LIKE '%depp%%johnny%' OR prop_type='WHOLENAME' AND MATCH(prop_value) AGAINST ('+depp +johnny' IN BOOLEAN MODE))
ORDER BY sanction_id
to

Quote:
SELECT * FROM ci_sanctions_properties
WHERE
prop_type='WHOLENAME' AND (((prop_value LIKE '%johnny%%depp%' OR MATCH(prop_value) AGAINST ('+johnny +depp' IN BOOLEAN MODE))
or
(prop_value LIKE '%depp%%johnny%' AND MATCH(prop_value) AGAINST ('+depp +johnny' IN BOOLEAN MODE)))
ORDER BY sanction_id
however if your DB was normalised (ie all the attributes of the person are stored in one row then the SQL becomes easier.

welcome to the evil world of the EAV model. in SQL Server Programming
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
Reply With Quote
  #5 (permalink)  
Old 08-05-10, 21:24
renevdkooi renevdkooi is offline
Registered User
 
Join Date: Aug 2010
Posts: 3
thanks again

thanks again for your answer,

I tried what you said and then even simplified it to:

SELECT * FROM ci_properties WHERE prop_type = 'WHOLENAME' AND (prop_value LIKE '%johnny%depp%' OR MATCH (prop_value) AGAINST ('+johnny +depp' IN BOOLEAN MODE) OR prop_value LIKE '%depp%johnny%')

as I realized that the MATCH AGAINST would already match so no need to have it twice. Your change shaved of half the time, but I am still not fast enough.

it took 0.0676 to do this result, only 10.000 times would get to around 10 minutes.

btw ALL information is in the same row.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On