Quote:
Originally Posted by nbtet
yes, this is working and exactly what i want. Just for one thing, if you notice the red mark inside the data, this should not be included i guess since it is obvious that there is no duplicate for this user id. why this is happening actually? the rest is just fine.
|
It's because of my sloppy usage of the "partition by" expression.
There are three rows for that user_id, the other two have a different hour but the same "minute interval". As the partition is evaluated over the whole result, these are take into account.
There are two solutions to this: either include the full date (without minutes) in the partition like this:
count(*) over (partition by user_id, to_char(t_stamp, 'yyyymmddhh24'), extract(minute from t_stamp)::int / 15) as num_dupes
or (which is probably more efficient) push the where condition that limits the time frame into the inner select in order to make the partition only work over the selected time frame:
Code:
select * from
(
select user_id,
t_stamp,
count(*) over (partition by user_id, extract(minute from t_stamp)::int / 15) as num_dupes
from click_log
where t_stamp between date_trunc('hour',timestamp '2010-05-18 13:44:00' - interval '1 hour')
and date_trunc('hour', timestamp '2010-05-18 13:44:00')
) t
where num_dupes > 1;
Of course both things can be combined, to make the query more robust.