Results 1 to 5 of 5
  1. #1
    Join Date
    Oct 2007
    Posts
    5

    Unanswered: Query against a table with name-value pairs

    We have a table that has name value pairs. We went with this design since this data is dynamic and having columns for this data was not possible (these are customer defined fields). Now due to large volume, our queries against them are getting slower. Our queries are search queries like ( Select item_id where (Field = 'f1' AND VALUE = 'x') AND (FIELD = 'f2' AND VALUE= 'y') OR (FIELD ='f3' AND VALUE='z') Any ideas of how to solve such an issue.

    The returned values can be > 50000 items for which these fields have these values... Is there some solution to optimize this? Could we optimize the condition tree. we have conditions e.g. (A AND B AND C) , (A OR B) AND C, Is there a way we can optimize this condition tree to maybe evaluate intelligently
    Last edited by ng75; 10-08-07 at 17:52.

  2. #2
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    Is there some solution to optimize this?
    Redesign into 3rd normal form.
    You are reaping what you've sowed.
    You've implemented a non-scalable application & reality is slower & slower response times as the amount of data increases.
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  3. #3
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    This type of design is called an EAV design and there are a lot of potential problems for the unwary - not least when you data set grows beyond a certain size. I'll assume you have table structure like:
    Code:
    create PARAMS table (
    	ID
    	FIELD
    	VALUE
    )

    First make sure you have indexes like:
    Code:
    create index idx1 on PARAMS ( VALUE, FIELD, ID )
    and
    create index idx2 on PARAMS ( ID, FIELD, VALUE )
    Then the SQL to do your (A AND B AND C) example would be:
    Code:
    select	ID
    from	PARAMS p1
    where 	VALUE = 'z'
    	and FIELD = 'f3'
    	and exists(
    		select	1
    		from	PARAMS p2
    		where	p2.ID = p1.ID
    			VALUE = 'y'
    			and FIELD = 'f2' )
    	and exists(
    		select	1
    		from	PARAMS p2
    		where	p2.ID = p1.ID
    			VALUE = 'x'
    			and FIELD = 'f1' )
    You also give some code to do this as ( Select item_id where (Field = 'f1' AND VALUE = 'x') AND (FIELD = 'f2' AND VALUE= 'y') OR (FIELD ='f3' AND VALUE='z') but this obviously won't work as you to join to the table multiple times for multiple fields.

    For your (A OR B) AND C query I'd use would be (C here is f3=z):

    Code:
    select	ID
    from	PARAMS p1
    where 	VALUE = 'z'
    	and FIELD = 'f3'
    	and (
    		exists(
    			select	1
    			from	PARAMS p2
    			where	p2.ID = p1.ID
    				VALUE = 'y'
    				and FIELD = 'f2' )
    		or exists(
    			select	1
    			from	PARAMS p2
    			where	p2.ID = p1.ID
    				VALUE = 'x'
    				and FIELD = 'f1' ) )


    The important thing is to make sure the initial part of the query narrows down the search ie you don't want where FIELD='GENDER' and VALUE='MALE' but FIELD='NAME' and VALUE='Bealzebub'.

    Can you try the above queries with some typical data and post some response times - if you're still having issues then could you detail the indexes you have the table definition and then supply a typical query that is running slow.

    I'd also look at writing a function that can return the VALUE for and FIELD for a given ID - this will make your code much easier to read ie
    select my_get_param_func( ID, FIELD )

    Mike

  4. #4
    Join Date
    Oct 2007
    Posts
    5

    reply to mike

    Mike,
    Thanks for your detailed reply. The query that you put in ur response is exactly what we do. Now that you understand the problem let me give you some more data points. The conditions are defined by our end users so it could be anything. so really it could be (A AND B AND C) or (D OR (A AND B)).

    Lets use the sample params table you came up with. We are interested in getting the ID from the params table for which certain criteria is valid. Now the superset of the ID are in a different table. Lets call it the master table. Our queries look like

    select ID FROM master m
    and exists (select 1 from params p1
    where FIELD = 'f1'
    AND value = 'x1'
    AND p1.ID = m.ID)
    and exists (select 1 from params p2
    where FIELD = 'f2'
    AND value = 'x2'
    AND p2.ID = m.ID))

    These queries are not hand written but generated from the conditions. If the master table has too many records (> 1 million), the exists really is slow. So we went this approach

    select id from master
    intersect
    select id from params
    where FIELD = 'f1'
    AND value = 'x1'
    intersect
    select id from params
    where FIELD = 'f2'
    AND value = 'x2'

    This gave better performance than the 1st one.. but still takes time for large volume..

  5. #5
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    I need to know
    • what indexes you have - this part is very important
    • could you also show me the results of
      Code:
      select FIELD,count(*) from PARAMS group by FIELD
    • it would also be good to find out how many unique IDs you have on the system
    • it would also help if you could also show the table definition
    Indexes
    Assuming an index on (VALUE,FIELD,ID) then it should be quick to pull the initial FIELD='f3' and VALUE='z' this will give it a bunch of IDs to play with to test the other field/value pairs If you have an index (preferably clustered) on (ID,FIELD,VALUE) then the server should find the other fields immediately to.

    As a nice to have it often helps to have a function that can return a given VALUE for a supplied ID and FIELD - this makes the coding easier. I believe in Oracle you can have functional indexes based on these.

    Spread of values
    Try and get an example query from the database and see how many different VALUEs there are for each FIELD and see if a query runs faster by making the initial query on the FIELD with the most values. If the initial query isn't very selective then it may opt not to use an index at all.
    Code:
    ie FIELD='NAME' and VALUE='ODIN'
    rather than FIELD='GENDER' and VALUE='MAN'

    50k records?
    This is a lot of records to pull from an EAV based system - they really work better with much smaller subsets (ie <1000 but preferably <100). Is that all the data in the system? Why would you want a report with 50k rows in? If there's no way round this could you extract all the data into a normal (non EAV) table on a daily basis and then do these large reports from this table.

    Your example
    Why do you need to pull from master at all in your examples above? doesn't this mean you get a table scan of master? Couldn't you make the query far faster with the following code, also if it is faster, get it to pick the most restrictive criteria first (see ODIN vs MAN)
    Code:
    select id params p1
    where FIELD = 'f1'
             AND value = 'x1'
             And exists( 
                      select 1 
                      from    params
                      where  p2.id = p1.id
                                AND FIELD = 'f2'
                                AND value = 'x2' )

    Background questions
    • Did the performance slowly degrade over time or did it suddenly go bad?
    • What is it your database stores?
    • Has there been any major changes to the system - new entity types, new fields, new indexes etc.

    Mike

    EDIT simple changes
    Please note I haven't done Oracle for years but was asked to have a look at this because I've had some experience of EAV designs
    Last edited by mike_bike_kite; 10-10-07 at 16:23.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •