| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

11-30-11, 14:21
|
|
Registered User
|
|
Join Date: Jul 2009
Posts: 8
|
|
Best x of y Records
|
|
Hello All,
I'm trying to find out how to average the best X of Y records in a table for each member. Here's what I have for columns:
Member_ID
Year
Month
MonthlySpend (summary value for month)
Basically, I'm trying to find out the average monthly spend for each member's best 4 of the last 6 months (the best months will vary by member of course). I thought this would be simple (and maybe it is) but I am having difficulty figuring this out. Any advice is greatly appreciated. I'm using MS SQL 2000.
Thanks,
Dustin
|
|

11-30-11, 14:45
|
|
Wage drone 24601
|
|
Join Date: Jan 2003
Location: Massachusetts
Posts: 4,899
|
|
This would be a lot easier, if you were at even SQL 2005. SQL 2005 introduces windowing functions and CTEs that make this relatively easy. I suppose there is no chance of an upgrade?
|
|

11-30-11, 15:21
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,606
|
|
|
|
There are lots of things that you need to think about, so I'll give you a really short demo to play with... Note that you MUST understand the limitations of the solution when dealing with "ties" for it to make any sense to business people at all.
Code:
DROP TABLE #foo
GO
CREATE TABLE #foo (
id INT
, y INT
, m INT
, MonthlySpend INT
)
INSERT INTO #foo
SELECT 1, 2000, 1, 1000 UNION
SELECT 1, 2000, 2, 2000 UNION
SELECT 1, 2000, 3, 3000 UNION
SELECT 1, 2000, 4, 4000 UNION
SELECT 1, 2000, 5, 5000 UNION
SELECT 1, 2000, 6, 6000 UNION
SELECT 2, 2000, 1, 1000 UNION
SELECT 2, 2000, 2, 2000 UNION
SELECT 2, 2000, 3, 3000 UNION
SELECT 2, 2000, 4, 3000 UNION
SELECT 2, 2000, 5, 5000 UNION
SELECT 2, 2000, 6, 6000 UNION
SELECT 3, 2000, 1, 1000 UNION
SELECT 3, 2000, 2, 1000 UNION
SELECT 3, 2000, 3, 1000 UNION
SELECT 3, 2000, 4, 1000 UNION
SELECT 3, 2000, 5, 1000 UNION
SELECT 3, 2000, 6, 5000
SELECT *
FROM (SELECT b.id, b.y, b.m, b.MonthlySpend
, (SELECT COUNT(*)
FROM #foo AS C
WHERE c.id = b.id
AND c.MonthlySpend <= b.MonthlySpend) AS myRank
FROM #foo AS b) AS a
WHERE a.myRank <= 4
SELECT AVG(MonthlySpend)
FROM (SELECT b.id, b.y, b.m, b.MonthlySpend
, (SELECT COUNT(*)
FROM #foo AS C
WHERE c.id = b.id
AND c.MonthlySpend <= b.MonthlySpend) AS myRank
FROM #foo AS b) AS a
WHERE a.myRank <= 4
-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
|
|

11-30-11, 17:30
|
|
Registered User
|
|
Join Date: Jul 2009
Posts: 8
|
|
@MCrowley - an upgrade isn't possible at this time.
@Pat
I modified the query for my table and ran it. The ranks came back with the lowest values for MonthlySpend having the lowest rank. I need the highest values to have the lowest ranks so it seems I missed something. Also, since I need to only look at the most recent six months how can I exclude older data?
Thanks for the replies guys,
Dustin
|
|

11-30-11, 17:44
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,606
|
|
You need to tweak one line in each WHERE clause. You absolutely MUST undertand the code to get any value from it, because some of the assumptions made are necessary because you can't upgrade to a new newer version of SQL, but those assumptions are not pretty! Experiment with the code so you understand it, otherwise it isn't safe to use.
-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
|
|

12-05-11, 17:50
|
|
Registered User
|
|
Join Date: Jul 2009
Posts: 8
|
|
Pat,
Is one of the assumptions that a value will exist for each month and that all values for each month will be different? If I run the following
SELECT A.Meta_ID, AVG(MonthlyTotalTrips) AS AvgMonthlyTrips
INTO #tmp_AvgMonthlyTrips
FROM (SELECT B.Meta_ID, B.Year, B.Month, B.MonthlyTotalTrips,
(
SELECT COUNT(*)
FROM #tmp_StatMonth C (NOLOCK)
WHERE C.Meta_ID = B.Meta_ID
AND C.MonthlyTotalTrips >= B.MonthlyTotalTrips
) AS TripRank
FROM #tmp_StatMonth B (NOLOCK)
) AS A
WHERE A.TripRank <= 4
GROUP BY A.Meta_ID
I get accurate results for everyone that has at least 6 months with a different trip count for each month. If two months have the same number of trips those months (and subsequent lower ranked months) are excluded from the results. If the two highest months have the same number of trips counts a NULL value is returned.
Aside from that, the code works great.
Thanks Again,
Dustin
|
|

12-06-11, 12:16
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,606
|
|
Odds are good that if you understand the code well enough to know which two lines need to be tweaked and how to tweak them, then it is probably safe for you to use this work around. The N-way tie in ranking values is the big issue, and understanding how that tie affects the results when it comes at the beginning/middle/end of the group is very important.
I think that you're probably Ok to use this now, as long as you keep the "gotchas" in mind when you provide your results to business people. They grasp these problems well if you explain the problem for them, you just have to make it clear if/when/how the data has been skewed so they can accomodate that skew.
-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|