Welcome to the dBforums forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions, articles and access our other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact support.

If you prefer not to see double-underlined words and corresponding ads, place your cursor
here for ContentLink opt out.

Go Back  dBforums > Database Server Software > Oracle > Query against a table with name-value pairs

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-08-07, 17:33
ng75 ng75 is offline
Registered User
 
Join Date: Oct 2007
Posts: 5
Query against a table with name-value pairs

We have a table that has name value pairs. We went with this design since this data is dynamic and having columns for this data was not possible (these are customer defined fields). Now due to large volume, our queries against them are getting slower. Our queries are search queries like ( Select item_id where (Field = 'f1' AND VALUE = 'x') AND (FIELD = 'f2' AND VALUE= 'y') OR (FIELD ='f3' AND VALUE='z') Any ideas of how to solve such an issue.

The returned values can be > 50000 items for which these fields have these values... Is there some solution to optimize this? Could we optimize the condition tree. we have conditions e.g. (A AND B AND C) , (A OR B) AND C, Is there a way we can optimize this condition tree to maybe evaluate intelligently

Last edited by ng75 : 10-08-07 at 17:52.
Reply With Quote
  #2 (permalink)  
Old 10-08-07, 18:08
anacedent anacedent is offline
Registered User
 
Join Date: Aug 2003
Location: Where the Surf Meets the Turf @Del Mar, CA
Posts: 3,569
Is there some solution to optimize this?
Redesign into 3rd normal form.
You are reaping what you've sowed.
You've implemented a non-scalable application & reality is slower & slower response times as the amount of data increases.
__________________
You can lead some folks to knowledge, but you can not make them think.
The average person thinks he's above average!
Reply With Quote
  #3 (permalink)  
Old 10-09-07, 06:35
mike_bike_kite mike_bike_kite is online now
Registered User
 
Join Date: Jun 2007
Location: London
Posts: 955
This type of design is called an EAV design and there are a lot of potential problems for the unwary - not least when you data set grows beyond a certain size. I'll assume you have table structure like:
Code:
create PARAMS table ( ID FIELD VALUE )
First make sure you have indexes like:
Code:
create index idx1 on PARAMS ( VALUE, FIELD, ID ) and create index idx2 on PARAMS ( ID, FIELD, VALUE )
Then the SQL to do your (A AND B AND C) example would be:
Code:
select ID from PARAMS p1 where VALUE = 'z' and FIELD = 'f3' and exists( select 1 from PARAMS p2 where p2.ID = p1.ID VALUE = 'y' and FIELD = 'f2' ) and exists( select 1 from PARAMS p2 where p2.ID = p1.ID VALUE = 'x' and FIELD = 'f1' )
You also give some code to do this as ( Select item_id where (Field = 'f1' AND VALUE = 'x') AND (FIELD = 'f2' AND VALUE= 'y') OR (FIELD ='f3' AND VALUE='z') but this obviously won't work as you to join to the table multiple times for multiple fields.
For your (A OR B) AND C query I'd use would be (C here is f3=z):
Code:
select ID from PARAMS p1 where VALUE = 'z' and FIELD = 'f3' and ( exists( select 1 from PARAMS p2 where p2.ID = p1.ID VALUE = 'y' and FIELD = 'f2' ) or exists( select 1 from PARAMS p2 where p2.ID = p1.ID VALUE = 'x' and FIELD = 'f1' ) )

The important thing is to make sure the initial part of the query narrows down the search ie you don't want where FIELD='GENDER' and VALUE='MALE' but FIELD='NAME' and VALUE='Bealzebub'.

Can you try the above queries with some typical data and post some response times - if you're still having issues then could you detail the indexes you have the table definition and then supply a typical query that is running slow.

I'd also look at writing a function that can return the VALUE for and FIELD for a given ID - this will make your code much easier to read ie
select my_get_param_func( ID, FIELD )

Mike
Reply With Quote
  #4 (permalink)  
Old 10-10-07, 01:57
ng75 ng75 is offline
Registered User
 
Join Date: Oct 2007
Posts: 5
reply to mike

Mike,
Thanks for your detailed reply. The query that you put in ur response is exactly what we do. Now that you understand the problem let me give you some more data points. The conditions are defined by our end users so it could be anything. so really it could be (A AND B AND C) or (D OR (A AND B)).

Lets use the sample params table you came up with. We are interested in getting the ID from the params table for which certain criteria is valid. Now the superset of the ID are in a different table. Lets call it the master table. Our queries look like

select ID FROM master m
and exists (select 1 from params p1
where FIELD = 'f1'
AND value = 'x1'
AND p1.ID = m.ID)
and exists (select 1 from params p2
where FIELD = 'f2'
AND value = 'x2'
AND p2.ID = m.ID))

These queries are not hand written but generated from the conditions. If the master table has too many records (> 1 million), the exists really is slow. So we went this approach

select id from master
intersect
select id from params
where FIELD = 'f1'
AND value = 'x1'
intersect
select id from params
where FIELD = 'f2'
AND value = 'x2'

This gave better performance than the 1st one.. but still takes time for large volume..
Reply With Quote
  #5 (permalink)  
Old 10-10-07, 06:07
mike_bike_kite mike_bike_kite is online now
Registered User
 
Join Date: Jun 2007
Location: London
Posts: 955
I need to know
  • what indexes you have - this part is very important
  • could you also show me the results of
    Code:
    select FIELD,count(*) from PARAMS group by FIELD
  • it would also be good to find out how many unique IDs you have on the system
  • it would also help if you could also show the table definition
Indexes
Assuming an index on (VALUE,FIELD,ID) then it should be quick to pull the initial FIELD='f3' and VALUE='z' this will give it a bunch of IDs to play with to test the other field/value pairs If you have an index (preferably clustered) on (ID,FIELD,VALUE) then the server should find the other fields immediately to.

As a nice to have it often helps to have a function that can return a given VALUE for a supplied ID and FIELD - this makes the coding easier. I believe in Oracle you can have functional indexes based on these.
Spread of values
Try and get an example query from the database and see how many different VALUEs there are for each FIELD and see if a query runs faster by making the initial query on the FIELD with the most values. If the initial query isn't very selective then it may opt not to use an index at all.
Code:
ie FIELD='NAME' and VALUE='ODIN' rather than FIELD='GENDER' and VALUE='MAN'
50k records?
This is a lot of records to pull from an EAV based system - they really work better with much smaller subsets (ie <1000 but preferably <100). Is that all the data in the system? Why would you want a report with 50k rows in? If there's no way round this could you extract all the data into a normal (non EAV) table on a daily basis and then do these large reports from this table.
Your example
Why do you need to pull from master at all in your examples above? doesn't this mean you get a table scan of master? Couldn't you make the query far faster with the following code, also if it is faster, get it to pick the most restrictive criteria first (see ODIN vs MAN)
Code:
select id params p1 where FIELD = 'f1' AND value = 'x1' And exists( select 1 from params where p2.id = p1.id AND FIELD = 'f2' AND value = 'x2' )
Background questions
  • Did the performance slowly degrade over time or did it suddenly go bad?
  • What is it your database stores?
  • Has there been any major changes to the system - new entity types, new fields, new indexes etc.
Mike

EDIT simple changes
Please note I haven't done Oracle for years but was asked to have a look at this because I've had some experience of EAV designs

Last edited by mike_bike_kite : 10-10-07 at 16:23.
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On