If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Microsoft SQL Server > Best x of y Records

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-30-11, 14:21
dlg dlg is offline
Registered User
 
Join Date: Jul 2009
Posts: 8
Question Best x of y Records

Hello All,

I'm trying to find out how to average the best X of Y records in a table for each member. Here's what I have for columns:

Member_ID
Year
Month
MonthlySpend (summary value for month)

Basically, I'm trying to find out the average monthly spend for each member's best 4 of the last 6 months (the best months will vary by member of course). I thought this would be simple (and maybe it is) but I am having difficulty figuring this out. Any advice is greatly appreciated. I'm using MS SQL 2000.

Thanks,
Dustin
Reply With Quote
  #2 (permalink)  
Old 11-30-11, 14:45
MCrowley MCrowley is offline
Wage drone 24601
 
Join Date: Jan 2003
Location: Massachusetts
Posts: 4,899
This would be a lot easier, if you were at even SQL 2005. SQL 2005 introduces windowing functions and CTEs that make this relatively easy. I suppose there is no chance of an upgrade?
Reply With Quote
  #3 (permalink)  
Old 11-30-11, 15:21
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,606
There are lots of things that you need to think about, so I'll give you a really short demo to play with... Note that you MUST understand the limitations of the solution when dealing with "ties" for it to make any sense to business people at all.
Code:
DROP TABLE #foo
GO
CREATE TABLE #foo (
   id		INT
,  y		INT
,  m		INT
,  MonthlySpend	INT
   )
   
INSERT INTO #foo
   SELECT 1, 2000,  1, 1000 UNION
   SELECT 1, 2000,  2, 2000 UNION
   SELECT 1, 2000,  3, 3000 UNION
   SELECT 1, 2000,  4, 4000 UNION
   SELECT 1, 2000,  5, 5000 UNION
   SELECT 1, 2000,  6, 6000 UNION
   SELECT 2, 2000,  1, 1000 UNION
   SELECT 2, 2000,  2, 2000 UNION
   SELECT 2, 2000,  3, 3000 UNION
   SELECT 2, 2000,  4, 3000 UNION
   SELECT 2, 2000,  5, 5000 UNION
   SELECT 2, 2000,  6, 6000 UNION
   SELECT 3, 2000,  1, 1000 UNION
   SELECT 3, 2000,  2, 1000 UNION
   SELECT 3, 2000,  3, 1000 UNION
   SELECT 3, 2000,  4, 1000 UNION
   SELECT 3, 2000,  5, 1000 UNION
   SELECT 3, 2000,  6, 5000

SELECT *
   FROM (SELECT b.id, b.y, b.m, b.MonthlySpend
,     (SELECT COUNT(*)
         FROM #foo AS C
         WHERE  c.id = b.id
            AND c.MonthlySpend <= b.MonthlySpend) AS myRank
      FROM #foo AS b) AS a
   WHERE a.myRank <= 4

SELECT AVG(MonthlySpend)
   FROM (SELECT b.id, b.y, b.m, b.MonthlySpend
,     (SELECT COUNT(*)
         FROM #foo AS C
         WHERE  c.id = b.id
            AND c.MonthlySpend <= b.MonthlySpend) AS myRank
      FROM #foo AS b) AS a
   WHERE a.myRank <= 4
-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
Reply With Quote
  #4 (permalink)  
Old 11-30-11, 17:30
dlg dlg is offline
Registered User
 
Join Date: Jul 2009
Posts: 8
@MCrowley - an upgrade isn't possible at this time.

@Pat

I modified the query for my table and ran it. The ranks came back with the lowest values for MonthlySpend having the lowest rank. I need the highest values to have the lowest ranks so it seems I missed something. Also, since I need to only look at the most recent six months how can I exclude older data?

Thanks for the replies guys,
Dustin
Reply With Quote
  #5 (permalink)  
Old 11-30-11, 17:44
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,606
You need to tweak one line in each WHERE clause. You absolutely MUST undertand the code to get any value from it, because some of the assumptions made are necessary because you can't upgrade to a new newer version of SQL, but those assumptions are not pretty! Experiment with the code so you understand it, otherwise it isn't safe to use.

-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
Reply With Quote
  #6 (permalink)  
Old 12-05-11, 17:50
dlg dlg is offline
Registered User
 
Join Date: Jul 2009
Posts: 8
Pat,

Is one of the assumptions that a value will exist for each month and that all values for each month will be different? If I run the following

SELECT A.Meta_ID, AVG(MonthlyTotalTrips) AS AvgMonthlyTrips
INTO #tmp_AvgMonthlyTrips
FROM (SELECT B.Meta_ID, B.Year, B.Month, B.MonthlyTotalTrips,
(
SELECT COUNT(*)
FROM #tmp_StatMonth C (NOLOCK)
WHERE C.Meta_ID = B.Meta_ID
AND C.MonthlyTotalTrips >= B.MonthlyTotalTrips
) AS TripRank
FROM #tmp_StatMonth B (NOLOCK)
) AS A
WHERE A.TripRank <= 4
GROUP BY A.Meta_ID

I get accurate results for everyone that has at least 6 months with a different trip count for each month. If two months have the same number of trips those months (and subsequent lower ranked months) are excluded from the results. If the two highest months have the same number of trips counts a NULL value is returned.

Aside from that, the code works great.

Thanks Again,
Dustin
Reply With Quote
  #7 (permalink)  
Old 12-06-11, 12:16
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,606
Odds are good that if you understand the code well enough to know which two lines need to be tweaked and how to tweak them, then it is probably safe for you to use this work around. The N-way tie in ranking values is the big issue, and understanding how that tie affects the results when it comes at the beginning/middle/end of the group is very important.

I think that you're probably Ok to use this now, as long as you keep the "gotchas" in mind when you provide your results to business people. They grasp these problems well if you explain the problem for them, you just have to make it clear if/when/how the data has been skewed so they can accomodate that skew.

-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On