If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > ANSI SQL > Max() on JOIN subquery with extra columns

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-01-08, 15:13
mikebrown mikebrown is offline
Registered User
 
Join Date: Oct 2006
Posts: 9
Question Max() on JOIN subquery with extra columns

Locations : Events = 1 : many
Locations 381 rows
Events 397 rows

I am trying to join the Locations and Events tables in order to get the most recent maintenance event for each location. The only differences below exist in the outer portion of the query with the Max() function (the "sub" is identical in each). The first query returns all the correct records that I want (the most recent event or a null event for each location in division 11, 381 rows). However, it does not give me the event_id that is associated with the event cmpltd_dt so I have no way of tying the location to the event. The second query adds the maint_evnt_id to the query but doing so breaks the aggregate function results we had correct in the first query.

Code:
SELECT  sub.loc_id, Max(sub.cmpltd_dt) as MAXcmpltd_dt
FROM 
(
SELECT locations.loc_id, events.event_id, events.cmpltd_dt 
FROM locations 
LEFT JOIN (SELECT * FROM events WHERE event_cd=701 AND div_nbr=11) events 
ON locations.loc_id = events.loc_id 
WHERE locations.div_nbr=11 
ORDER BY events.cmpltd_dt ASC, locations.loc_id ASC
) 
sub 
GROUP BY sub.loc_id
Code:
SELECT  sub.loc_id, Max(sub.cmpltd_dt) as MAXcmpltd_dt, sub.event_id
FROM 
(
SELECT locations.loc_id, events.event_id, events.cmpltd_dt 
FROM locations 
LEFT JOIN (SELECT * FROM events WHERE event_cd=701 AND div_nbr=11) events 
ON locations.loc_id = events.loc_id 
WHERE locations.div_nbr=11 
ORDER BY events.cmpltd_dt ASC, locations.loc_id ASC
) 
sub 
GROUP BY sub.loc_id , sub.event_id
Anyone able to help will join r937 on the Whiteboard of Fame in conference room C-011.
Reply With Quote
  #2 (permalink)  
Old 10-01-08, 16:18
Pat Phelan Pat Phelan is online now
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,595
Code:
SELECT l.loc_id, e.event_id, e.cmpltd_dt
   FROM locations AS l
   JOIN events AS e
      ON (e.loc_id = l.loc_id)
   WHERE  11 = l.div_nbr
      AND e.cmpltd_dt = (SELECT Max(z.cmpltd_dt)
         FROM events AS z
         WHERE  z.loc_id = l.loc_id)
   ORDER BY e.cmpltd_dt ASC, l.loc_id ASC
-PatP
Reply With Quote
  #3 (permalink)  
Old 10-01-08, 17:17
mikebrown mikebrown is offline
Registered User
 
Join Date: Oct 2006
Posts: 9
Unfortunately, no dice.
The date field is not unique enough to pull from the event table (some events have the same date).
Also, this doesn't return a row for locations with no event.

So far, the closest I can get is:
Code:
SELECT  sub.loc_id, Max(sub.cmpltd_dt) as MAXcmpltd_dt, Max(sub.event_id) as event_id
FROM 
(
SELECT locations.loc_id, events.event_id, events.cmpltd_dt 
FROM locations 
LEFT JOIN (SELECT * FROM events WHERE event_cd=701 AND div_nbr=11) events 
ON locations.loc_id = events.loc_id 
WHERE locations.div_nbr=11 
ORDER BY events.cmpltd_dt ASC, locations.loc_id ASC
) 
sub 
GROUP BY sub.loc_id
This works 99% of the time because the event_id is a sort of a timestamp-derived key and 99% of the time the event is created at a time that matches the cmpltd_dt field so it sorts the same, but there are situations that break it so it is no good.
Reply With Quote
  #4 (permalink)  
Old 10-01-08, 17:44
Pat Phelan Pat Phelan is online now
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,595
Code:
SELECT l.loc_id, e.event_id, e.cmpltd_dt
   FROM locations AS l
   LEFT JOIN (SELECT *
      FROM events AS z1
      WHERE  701 = z1.event_code
         AND z1.cmpltd_dt = (SELECT Max(z.cmpltd_dt)
            FROM events AS z2
            WHERE  z2.loc_id = z1.loc_id
               AND 701 = z2.event_code)) AS e
      ON (e.loc_id = l.loc_id)
   WHERE  11 = l.div_nbr
   ORDER BY e.cmpltd_dt ASC, l.loc_id ASC
I take it that these aren't all maintenance events then, and that 701 indicates a maintenance event.

-PatP
Reply With Quote
  #5 (permalink)  
Old 10-02-08, 10:00
mikebrown mikebrown is offline
Registered User
 
Join Date: Oct 2006
Posts: 9
Yeah, sorry about that. Must have lopped it off when trying to make the SQL readable. Of the 381 Locations, 164 don't have a maintenance event so should return null in the join. The rest of the locations of 1-3 maintenance events that I am trying to get the latest of with Max().
Reply With Quote
  #6 (permalink)  
Old 10-02-08, 17:15
mikebrown mikebrown is offline
Registered User
 
Join Date: Oct 2006
Posts: 9
Simplified

Maybe this will be easier... Here is a representation of the data that I have. All Loc's are represented and some will have events associated with them but some will not. I want to query this data and return 1 row for each Loc (regardless of events), and if it has an event, give me the latest one (if it doesn't give me a null). Like this...
Code:
loc_id	event_dt	event_id
100	7/9/2008	1
100	7/15/2008	2
101	null		null
102	7/9/2008	3
103	7/9/2008	4
103	7/22/2008	5
104	7/9/2008	6
104	7/13/2008	7
104	7/28/2008	8
105	null		null
106	7/9/2008	9
** magical query stuff happens **

Code:
loc_id	event_dt	event_id
100	7/15/2008	2
101	null		null
102	7/9/2008	3
103	7/22/2008	5
104	7/28/2008	8
105	null		null
106	7/9/2008	9
We have one row for each Loc and either a null record or the latest event. Max() doesn't do it because it factors the event_id. Any ideas?
Reply With Quote
  #7 (permalink)  
Old 10-03-08, 13:44
Pat Phelan Pat Phelan is online now
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,595
So, did my SQL from post #4 solve the problem, or does it still need more work? I think that it is complete and correct, but you've posted twice since then so I'm not sure.

-PatP
Reply With Quote
  #8 (permalink)  
Old 10-03-08, 14:12
mikebrown mikebrown is offline
Registered User
 
Join Date: Oct 2006
Posts: 9
Wow, I am an idiot. I thought you were quoting my snippet from earlier.

Just ran your query and it is almost dead on. I get 386 rows instead of the 381 because there are five "701" events done the same day as another at the same location. Even if I can't get these double-ups removed your SQL definitely seems to give me correct results. Do you see where I can through in a DISTINCT or something to be sure I get just the 381?
Reply With Quote
  #9 (permalink)  
Old 10-03-08, 16:00
Pat Phelan Pat Phelan is online now
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,595
Yes, DISTINCT will do that.

Just as an observation, you REALLY need to think about normalizing that bundle of fun. There are a few "issues" in what you've posted so far, and I think what I can see from here is only the tip of the iceberg... Guaranteed, no doubt in my mind, it is only a matter of time before it comes to bite you.

-PatP
Reply With Quote
  #10 (permalink)  
Old 10-03-08, 16:59
mikebrown mikebrown is offline
Registered User
 
Join Date: Oct 2006
Posts: 9
Is it not normalized?

Location table
loc_id(pk) a_bunch_of_location_properties…

Events table
loc_id(fk) event_id(pk) event_dt event_cd(fk to code table) other_event_properties…

The only think that I thought breaks the normalization was that I put a copy of the location division number in the event table because for processing just events it became a lot of trouble always having to join locations. In fact, originally to get an event division number you had to join it to the locations table, then locations to the city table, and city to the county table that determines the division... seemed like a lot of overhead to simply give me all the locations in a specific division (which is what all my queries require since each user is division specific).

I really appreciate your help. The DBA's here refuse to look at SQL and our developers are engineering specific with hardly any DB experience so we get lost quite easily.
Reply With Quote
  #11 (permalink)  
Old 10-04-08, 06:44
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
First you should try to get a clean design in 3NF, i.e. avoid duplicated information as much as possible. Once you are there, you can tune your system to give you the desired performance. Denormalizing things is a valid approach during performance tuning, but it should be the last resort because it comes with a lot of problems and impact on your application design.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On