Quote:
|
This will indeed run faster in the rare scenarios where Process dates are unique for each cust_id.
|
Unh? A sort is O(n * log2(n)) while a MAX(n) is O(n). The uniqueness of the date will not change that. Also, how do you know that duplicate dates are common from the specs? Why not assume that this process is daily?
In fact, the MAX() has another advantage, if we can fold the “process_date_max = process_date” into the scan. Make it easy with integers. Do the first scan, and since you have only one row, it is the max so far:
Value Value_max_so_far
100 100 ◀ keep
110
100
99
Step two, we now have a new greatest value, so replace it. You could now delete all the rows that are strictly less the new current max value, but let us keep scanning and dropping as we go.
Value Value_max_so_far
=================
100 100 drop
110 110 ◀ keep
100
99
Value Value_max_so_far
==================
100 100
110 110 keep
100 110 ◀ drop
99
Value Value_max_so_far
==================
100 100 drop
110 110 keep
100 110 drop
99 110 ◀ drop
If Conner is at the Cactuss meetings tonight and tomorrow, I will ask how MS does it.
Quote:
|
But I don't see how that method is extensible to secondary sorting. For instance, if a customer has two records [sic: rows are not records] with identical process dates and the business rule is to use the last one entered (the one with the highest sequential "cust_something_seq"), can that be rolled into your existing logic, or would you have to revert back to the "ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY processed_date DESC, cust_something_seq DESC) AS row_nbr" method?
|
Yes, if you change the specs and create new columns, you might need a new query

And I would probably go with the ROW_NUMBER(). However, one trick I have seen was to use a computed column. The students took tests that had three parts that carried different weights that gave a final score. Skeleton DDL:
CREATE TABLE Tests
(student_id CHAR (10) NOT NULL,
test_nbr SMALLINT NOT NULL,
PRIMARY KEY (student_id, test_nbr),
x_score INTEGER NOT NULL,
y_score INTEGER NOT NULL,
z_score INTEGER NOT NULL,
grand_score COMPUTED AS (10*x_score + 15*y_score + z_score)
);
Since the computed score was being used for lots of stuff and the formula is simple, materialization made sense. It was waiting in the data, not computed on the fly. But combining mixed data types might not be such a good idea.
WITH X
AS
(SELECT student_id, grand_score,
MAX(grand_score)
OVER (PARTITION BY student_id) AS grand_score_max
FROM Tests)
SELECT student_id, grand_score
FROM X
WHERE grand_score_max = grand_score;