Results 1 to 2 of 2
  1. #1
    Join Date
    Nov 2009

    Unanswered: [SOLVED] Convert correlated subquery to join?

    I have a query at the moment that uses a correlated subquery. The scenario is I have a table of item changes over time, basically ID, priceChange and a calculated row number. Some items may not change in a particular month and so I have a situation where I end up with a table like
    ID    priceChange    Month       rn
    1     100      March 2010         1
    1     null     April  2010        2
    1     null     May 2010           3
    1     null     June 2010          4
    1     200      July 2010          5
    2     520      March 2010         1 
    2     null     April  2010        2
    2     null     May 2010           3
    2     300      June 2010          4
    I want to fill in the null amount for each item by selecting the previous "real" item price for that particular item. Hence I want to end up with

    ID    priceChange    Month       rn
    1     100      March 2010        1
    1     100     April  2010        2
    1     100     May 2010           3
    1     100     June 2010          4
    1     200      July 2010         5
    2     520      March 2010        1 
    2     520     April  2010        2
    2     520     May 2010           3
    2     300      June 2010         4
    To do this, I do a join based on the item id being the same and select the record with largest rn value that has same ID.

    coalesce(a.priceChange, b.priceChange) as priceChange
    FROM items a, items b
    where = AND b.rn = (   
        select max(rn) from items where id = and rn <= a.rn
    It works fine, just a bit slow, due to the correlated subquery. I have seen some examples of how these can be replaced with JOINs but I cant get it to fit my problem

    Can anyone assist?


    Last edited by FLANDERS; 09-17-10 at 05:14.

  2. #2
    Join Date
    Nov 2009
    OK I think I have this sorted now. The below query has gone from 17 minutes to 22 seconds on my machine:
    -- Fills in the blanks of the items table by joining each item change record
    -- to the previous valid record for the same item, to allow retrival of last "real" price change
    select * from (	
    	coalesce(a.priceChange, b.priceChange) as priceChange,
    ROW_NUMBER() OVER (PARTITION BY, a.aMonth ORDER BY b.rn desc) as rn1  	
    	FROM items a, items b
    	where = AND 	-- Only want records that relate to same item
    	b.rn <= a.rn and    -- Previous or same (ensures we get items with no change at all) records only 
    	b.priceChange IS NOT NULL   -- Previous record must be a "real" record
    ) tt where rn1 = 1 -- Get the latest "real" record that relates to same item. Latest is determined by the
    -- row number column being 1 (since we ordered the partition DESC)
    The correlated subquery has been replaced by the use of the analytical ROW_NUMBER() function. I do a self join to get all items before the "current" item, where there is a valid priceChange. The use of <= ensures that if there is no price change at all for a particular item, it will be joined to itself.
    As part of the column selection, I assign each record a row number by partitioning the result set on the item id and month, ordering it so that the latest change record will be given a value of 1. This makes it possible to select the latest value using rn1 = 1 rather than selecting the max value.

    There has been a huge performance increase using this strategy. It again confirms that correlated subqueries should be looked at if there is a query performance issue.

    Hopefully someone else will benefit from this in the future.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts