Results 1 to 7 of 7
  1. #1
    Join Date
    Jul 2009
    Posts
    8

    Question Unanswered: Best x of y Records

    Hello All,

    I'm trying to find out how to average the best X of Y records in a table for each member. Here's what I have for columns:

    Member_ID
    Year
    Month
    MonthlySpend (summary value for month)

    Basically, I'm trying to find out the average monthly spend for each member's best 4 of the last 6 months (the best months will vary by member of course). I thought this would be simple (and maybe it is) but I am having difficulty figuring this out. Any advice is greatly appreciated. I'm using MS SQL 2000.

    Thanks,
    Dustin

  2. #2
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    This would be a lot easier, if you were at even SQL 2005. SQL 2005 introduces windowing functions and CTEs that make this relatively easy. I suppose there is no chance of an upgrade?

  3. #3
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    There are lots of things that you need to think about, so I'll give you a really short demo to play with... Note that you MUST understand the limitations of the solution when dealing with "ties" for it to make any sense to business people at all.
    Code:
    DROP TABLE #foo
    GO
    CREATE TABLE #foo (
       id		INT
    ,  y		INT
    ,  m		INT
    ,  MonthlySpend	INT
       )
       
    INSERT INTO #foo
       SELECT 1, 2000,  1, 1000 UNION
       SELECT 1, 2000,  2, 2000 UNION
       SELECT 1, 2000,  3, 3000 UNION
       SELECT 1, 2000,  4, 4000 UNION
       SELECT 1, 2000,  5, 5000 UNION
       SELECT 1, 2000,  6, 6000 UNION
       SELECT 2, 2000,  1, 1000 UNION
       SELECT 2, 2000,  2, 2000 UNION
       SELECT 2, 2000,  3, 3000 UNION
       SELECT 2, 2000,  4, 3000 UNION
       SELECT 2, 2000,  5, 5000 UNION
       SELECT 2, 2000,  6, 6000 UNION
       SELECT 3, 2000,  1, 1000 UNION
       SELECT 3, 2000,  2, 1000 UNION
       SELECT 3, 2000,  3, 1000 UNION
       SELECT 3, 2000,  4, 1000 UNION
       SELECT 3, 2000,  5, 1000 UNION
       SELECT 3, 2000,  6, 5000
    
    SELECT *
       FROM (SELECT b.id, b.y, b.m, b.MonthlySpend
    ,     (SELECT COUNT(*)
             FROM #foo AS C
             WHERE  c.id = b.id
                AND c.MonthlySpend <= b.MonthlySpend) AS myRank
          FROM #foo AS b) AS a
       WHERE a.myRank <= 4
    
    SELECT AVG(MonthlySpend)
       FROM (SELECT b.id, b.y, b.m, b.MonthlySpend
    ,     (SELECT COUNT(*)
             FROM #foo AS C
             WHERE  c.id = b.id
                AND c.MonthlySpend <= b.MonthlySpend) AS myRank
          FROM #foo AS b) AS a
       WHERE a.myRank <= 4
    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  4. #4
    Join Date
    Jul 2009
    Posts
    8
    @MCrowley - an upgrade isn't possible at this time.

    @Pat

    I modified the query for my table and ran it. The ranks came back with the lowest values for MonthlySpend having the lowest rank. I need the highest values to have the lowest ranks so it seems I missed something. Also, since I need to only look at the most recent six months how can I exclude older data?

    Thanks for the replies guys,
    Dustin

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    You need to tweak one line in each WHERE clause. You absolutely MUST undertand the code to get any value from it, because some of the assumptions made are necessary because you can't upgrade to a new newer version of SQL, but those assumptions are not pretty! Experiment with the code so you understand it, otherwise it isn't safe to use.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  6. #6
    Join Date
    Jul 2009
    Posts
    8
    Pat,

    Is one of the assumptions that a value will exist for each month and that all values for each month will be different? If I run the following

    SELECT A.Meta_ID, AVG(MonthlyTotalTrips) AS AvgMonthlyTrips
    INTO #tmp_AvgMonthlyTrips
    FROM (SELECT B.Meta_ID, B.Year, B.Month, B.MonthlyTotalTrips,
    (
    SELECT COUNT(*)
    FROM #tmp_StatMonth C (NOLOCK)
    WHERE C.Meta_ID = B.Meta_ID
    AND C.MonthlyTotalTrips >= B.MonthlyTotalTrips
    ) AS TripRank
    FROM #tmp_StatMonth B (NOLOCK)
    ) AS A
    WHERE A.TripRank <= 4
    GROUP BY A.Meta_ID

    I get accurate results for everyone that has at least 6 months with a different trip count for each month. If two months have the same number of trips those months (and subsequent lower ranked months) are excluded from the results. If the two highest months have the same number of trips counts a NULL value is returned.

    Aside from that, the code works great.

    Thanks Again,
    Dustin

  7. #7
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Odds are good that if you understand the code well enough to know which two lines need to be tweaked and how to tweak them, then it is probably safe for you to use this work around. The N-way tie in ranking values is the big issue, and understanding how that tie affects the results when it comes at the beginning/middle/end of the group is very important.

    I think that you're probably Ok to use this now, as long as you keep the "gotchas" in mind when you provide your results to business people. They grasp these problems well if you explain the problem for them, you just have to make it clear if/when/how the data has been skewed so they can accomodate that skew.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •