Results 1 to 5 of 5
  1. #1
    Join Date
    Jan 2013
    Posts
    354
    Provided Answers: 1

    Unanswered: Relational Division with temporal twist

    The set up is that you have a table that captures a series of events. There is a select group of ten items I want to look for, but only when they occur in a certain time frame. They have to belong to the same user within say five minutes of each other.

    CREATE TABLE Events
    (item_id INTEGER NOT NULL,
    user_id INTEGER NOT NULL,
    event_timestamp DATETIME2(0) DEFAULT CURRENT_TIMESTAMP NOT NULL,
    PRIMARY KEY (item_id, user_id, event_timestamp) );

    This is a kind of relational division. First set up your divisor (the ten items) in a table:

    CREATE TABLE Divisor
    (item_id INTEGER NOT NULL PRIMARY KEY);
    INSERT INTO Divisor VALUES (1), (2), .., (10);

    The basic relational division is easy when you have only one (user_id, item_id) in the dividend (Events) table and just want to find who has all ten without regard to how long it took them:

    SELECT E1.user_id
    FROM (SELECT DISTINCT E1.item_id, E1.user_id - clear timestamp
    FROM Events) AS E1,
    Divisor AS D1
    WHERE E1.item_id = D1.item_id
    GROUP BY E1.user_id
    HAVING COUNT(E1.item_id) = (SELECT COUNT(item_id) FROM Divisor);

    The question is how that timestamp works. Are these five-minute timeslots that start and end at fixed points in time? ('2013-05-25 00:00:00', '2013-05-25 00:04:59') etc? Or is it any five minute frame that I can drop on the data?

    The first is fairly easy; the second spec is a bitch. Anyone want to try it?

  2. #2
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    I'll give it a try.

    First I chop the events in 5 minute time frames.

    Each event can start a time frame, when
    - the item is included in Divisor. An event on an item we are not interested in is useless.

    The "next" event must
    - have the same user as the start event,
    - its timestamp must be on or within 5 minutes after the start event's timestamp ("on" to include the start event itself)
    - the item must be included in Divisor. An event on an item we are not interested in is useless.

    Once we have those time frames, we only keep those in which all items we are interested in occur (the 10 different items in Divisor).

    Code:
    --The set up is that you have a table that captures a series of events. 
    --There is a select group of ten items I want to look for, but only when 
    -- they occur in a certain time frame. 
    --They have to belong to the same user within say five minutes of each other.
    
    CREATE TABLE Events(
    	item_id		INTEGER	NOT NULL,
    	user_id		INTEGER	NOT NULL,
    	event_timestamp	DATETIME2(0)	DEFAULT CURRENT_TIMESTAMP NOT NULL,
    	PRIMARY KEY (item_id, user_id, event_timestamp) 
    );
    
    INSERT INTO Events (item_id, user_id, event_timestamp) VALUES
    (1, 1, '2013-05-27 11:39:00.000'),
    (2, 1, '2013-05-27 11:39:10.000'),
    (3, 1, '2013-05-27 11:39:20.000'),
    (4, 1, '2013-05-27 11:39:30.000'),
    (5, 1, '2013-05-27 11:39:40.000'),
    (6, 1, '2013-05-27 11:39:50.000'),
    (7, 1, '2013-05-27 11:40:00.000'),
    (8, 1, '2013-05-27 11:40:10.000'),
    (9, 1, '2013-05-27 11:40:20.000'),
    (10, 1, '2013-05-27 11:40:30.000'),
    (11, 1, '2013-05-27 11:40:40.000'),
    
    (1, 2, '2013-05-27 11:39:00.000'),
    (2, 2, '2013-05-27 11:39:10.000'),
    (3, 2, '2013-05-27 11:39:20.000'),
    (4, 2, '2013-05-27 11:39:30.000'),
    (5, 2, '2013-05-27 11:39:40.000'),
    (6, 2, '2013-05-27 11:39:50.000'),
    (7, 2, '2013-05-27 11:40:00.000'),
    (11, 2, '2013-05-27 11:40:40.000'),
    
    (1, 3,  '2013-05-27 11:40:00.000'),
    (1, 3,  '2013-05-27 11:40:10.000'),
    (2, 3,  '2013-05-27 11:40:30.000'),
    (2, 3,  '2013-05-27 11:40:31.000'),
    (3, 3,  '2013-05-27 11:41:00.000'),
    (3, 3,  '2013-05-27 11:41:01.000'),
    (4, 3,  '2013-05-27 11:42:30.000'),
    (4, 3,  '2013-05-27 11:42:31.000'),
    (5, 3,  '2013-05-27 11:43:00.000'),
    (5, 3,  '2013-05-27 11:43:01.000'),
    (6, 3,  '2013-05-27 11:43:30.000'),
    (6, 3,  '2013-05-27 11:43:31.000'),
    (7, 3,  '2013-05-27 11:44:00.000'),
    (7, 3,  '2013-05-27 11:44:01.000'),
    (8, 3,  '2013-05-27 11:44:30.000'),
    (8, 3,  '2013-05-27 11:44:31.000'),
    (9, 3,  '2013-05-27 11:44:35.000'),
    (9, 3,  '2013-05-27 11:44:36.000'),
    (10,3,  '2013-05-27 11:44:40.000'),
    (10,3,  '2013-05-27 11:44:41.000'),
    (11, 3, '2013-05-27 11:44:37.000')
    
    
    CREATE TABLE Divisor(
    	item_id INTEGER NOT NULL PRIMARY KEY
    );
    
    INSERT INTO Divisor VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
    
    --The question is how that timestamp works. Are these five-minute timeslots 
    -- that start and end at fixed points in time? ('2013-05-25 00:00:00', 
    -- '2013-05-25 00:04:59') etc? Or is it any five minute frame that I can 
    -- drop on the data?
    -- The first is fairly easy; the second spec is a bitch. Anyone want to 
    -- try it?
    GO
    WITH FiveMinFrame AS(
    -- Get all item/user/ts following a certain start record within a 5 
    -- minute frame. The start record starts the 5 minute frame, the start 
    -- record itself is also included in the time frame
    SELECT StartE.item_id as Start_Item_id, StartE.user_id, 
    	StartE.event_timestamp as Start_timestamp, 
    	nextE.item_id as next_item_id, nextE.event_timestamp as next_timestamp
    FROM Events as StartE
    	INNER JOIN Divisor as SD ON	
    -- only allow events to start a 5 minute frame when the Item is in Divisor 
    --(items we are interested in)
    		StartE.item_id = SD.item_id
    	INNER JOIN Events as nextE ON
    		StartE.user_id = nextE.user_id AND
    		nextE.event_timestamp BETWEEN StartE.event_timestamp AND 
    				DATEADD(mi, 5, StartE.event_timestamp)
    	INNER JOIN Divisor as ED ON	
    -- only allow events to occur in the 5 minute 
    -- frame when the Item is in Divisor (items we are interested in)
    		nextE.item_id = ED.item_id
    ),
    AllDivisorsIncluded as
    (SELECT E1.user_id, E1.Start_timestamp
    FROM (SELECT DISTINCT user_id, Start_timestamp, next_item_id
    	FROM FiveMinFrame
    	) AS E1
    GROUP BY E1.user_id, E1.Start_timestamp
    HAVING COUNT(E1.next_item_id) = (SELECT COUNT(item_id) FROM Divisor)
    )
    SELECT user_id, Start_timestamp
    FROM AllDivisorsIncluded
    
    
    DROP TABLE Events
    DROP TABLE Divisor
    Last edited by Wim; 05-27-13 at 10:28.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  3. #3
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Hi Celko,

    I was expecting some kind response on my solution from you ... (feeling ignored )

    The question is how that timestamp works. Are these five-minute timeslots that start and end at fixed points in time? ('2013-05-25 00:00:00', '2013-05-25 00:04:59') etc? Or is it any five minute frame that I can drop on the data?

    The first is fairly easy; the second spec is a bitch. Anyone want to try it?
    I found the second pretty easy to solve once I had sketched the problem on paper to determine what I needed. I hardly ever need date/time arithmetic, so I'm not fluent at it. How would you solve it if the 5 minute timeslots were predefined fixed points in time?
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  4. #4
    Join Date
    Jan 2013
    Posts
    354
    Provided Answers: 1

    sorry, I got tied up ..

    Sorry I got tied up with finished a book manuscript, patching a roof, my brother's death and other things. Give me a chance to get back to this.

  5. #5
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    I'm sorry about your brother.

    My tongue-in-cheek post is totally misplaced in that context. My apologies.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •