Page 1 of 2 12 LastLast
Results 1 to 15 of 28
  1. #1
    Join Date
    Aug 2003
    Posts
    32

    Unanswered: Selecting parents record with last child record

    I am trying to write a query that will show each parent record from a table with one child record (the one with the highest date then id) from another table.

    Example:

    create table sales_person (
    id int,
    name varchar(50)
    )

    insert into sales_person values (1, 'fred')
    insert into sales_person values (2, 'sally')

    create table sales (
    id int,
    date datetime,
    sales_person_id int,
    value money
    )

    insert into sales values (1, '2001-01-01', 1, 100)
    insert into sales values (2, '2001-01-02', 1, 200)
    insert into sales values (3, '2001-01-03', 2, 300)
    insert into sales values (4, '2001-01-04', 2, 400)

    select [some query to create the following result]

    sales_person.id | sales_person.name | sales.id | sales.date | sales.value
    -----------------+-------------------+----------+------------+-------------
    1 | fred | 2 | 2001-01-02 | 200
    2 | sally | 4 | 2001-01-04 | 400

    Up until now I've only been able to achive this with a stored procedure but feel that there must be an easier way using a qurey.

    Can anyone shed some light on this for me?

    Thank you,

    Stacey.

  2. #2
    Join Date
    Aug 2003
    Posts
    1
    I think you could do it by doing a query of group by + top 1 order by desc.

    Too concise?

  3. #3
    Join Date
    Aug 2003
    Posts
    32
    Originally posted by macdaddie0
    I think you could do it by doing a query of group by + top 1 order by desc.

    Too concise?
    Thanks for the reply.

    Yes it is a little. I started going down that path but I wasn't able to form a query that would work.

    I would really appreciated it if you could fill that out a little more.

    Stacey.

  4. #4
    Join Date
    Aug 2003
    Location
    Delft, The Netherlands (EU)
    Posts
    447
    Originally posted by stacey_richards
    Thanks for the reply.

    Yes it is a little. I started going down that path but I wasn't able to form a query that would work.

    I would really appreciated it if you could fill that out a little more.

    Stacey.

    Hi,

    This will help you:

    SELECT P.id, P.name, S.id, S.date, S.value
    FROM Sales S INNER JOIN Sales_Person P ON S.Sales_Person_id = P.ID
    WHERE Exists (SELECT * FROM
    (SELECT max(ID) MID, Sales_Person_ID PID FROM Sales GROUP BY Sales_Person_ID) G
    WHERE G.MID = S.ID AND G.PID = P.ID)
    Make everything as simple as possible, but not simpler! - A. Einstein
    DB Problems? DB Explorer, BTrieve Re-engineering, DB Conversions & ETL? Conversion Tool

  5. #5
    Join Date
    Aug 2003
    Posts
    32

    Smile

    Originally posted by DoktorBlue
    Hi,

    This will help you:

    SELECT P.id, P.name, S.id, S.date, S.value
    FROM Sales S INNER JOIN Sales_Person P ON S.Sales_Person_id = P.ID
    WHERE Exists (SELECT * FROM
    (SELECT max(ID) MID, Sales_Person_ID PID FROM Sales GROUP BY Sales_Person_ID) G
    WHERE G.MID = S.ID AND G.PID = P.ID)
    Thank you so much, just tried it out and it seems to do the trick. Great stuff!

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    doktorblue, i haven't tried yours, it is fairly complex, and it does not appear to do anything to test the latest date

    stacey, you said "the one with the highest date then id" -- the highest id doesn't make sense, because all the child records will have the same parent id, so within the set of child rows, just pick the highest date

    here's the solution, it uses a correlated subquery:
    Code:
    select p.id
         , p.name
         , s.date
         , s.value
      from sales_person p
    inner
      join sales s
        on s.sales_person_id = p.id
     where s.date =
           ( select max(date)
               from sales
              where sales_person_id = p.id )
    tip: never use a reserved word (e.g. date) for a column name

    rudy
    http://r937.com/
    Last edited by r937; 08-12-03 at 08:22.

  7. #7
    Join Date
    Aug 2003
    Location
    Delft, The Netherlands (EU)
    Posts
    447
    Originally posted by r937
    doktorblue, i haven't tried yours, it is fairly complex, and it does not appear to do anything to test the latest date

    stacey, you said "the one with the highest date then id" -- the highest id doesn't make sense, because all the child records will have the same parent id, so within the set of child rows, just pick the highest date

    here's the solution, it uses a correlated subquery:
    Code:
    select p.id
         , p.name
         , s.date
         , s.value
      from sales_person p
    inner
      join sales s
        on s.sales_person_id = p.id
     where s.date =
           ( select max(date)
               from sales
              where sales_person_id = p.id )
    tip: never use a reserved word (e.g. date) for a column name

    rudy
    http://r937.com/
    Indeed, I determined the highest ID; but my code can be modified easily to use the date:

    SELECT P.id, P.name, S.id, S.date, S.value
    FROM Sales S INNER JOIN Sales_Person P ON S.Sales_Person_id = P.ID
    WHERE Exists (SELECT * FROM
    (SELECT max(Date) MDate, Sales_Person_ID PID FROM Sales GROUP BY Sales_Person_ID) G
    WHERE G.MDate = S.Date AND G.PID = P.ID)

    This should do the same a Rudy's code; however Stacey, be aware that using the date may result in multiple records if you have the same date multiple times for the same person.
    Make everything as simple as possible, but not simpler! - A. Einstein
    DB Problems? DB Explorer, BTrieve Re-engineering, DB Conversions & ETL? Conversion Tool

  8. #8
    Join Date
    Aug 2003
    Posts
    32
    Originally posted by r937
    doktorblue, i haven't tried yours, it is fairly complex, and it does not appear to do anything to test the latest date

    stacey, you said "the one with the highest date then id" -- the highest id doesn't make sense, because all the child records will have the same parent id, so within the set of child rows, just pick the highest date

    here's the solution, it uses a correlated subquery:
    Code:
    select p.id
         , p.name
         , s.date
         , s.value
      from sales_person p
    inner
      join sales s
     where s.date =
           ( select max(date)
               from sales
              where sales_person_id = p.id )
    rudy
    http://r937.com/
    Thanks Rudy.

    I did notice that the previous example didn't pick the highest date. I thought that I would work through it, find out how it worked and maybe I could have modified it accordingly.

    Your solution does seem to be a lot less complex, however, and does address the highest date. I completed your join (...from sales_person p inner join sales s on s.sales_person_id = p.id where...) and it seemed to do the trick.

    It did fall over when I added another sales record:

    insert into sales values (5, '2001-01-02', 1, '500')

    This record is the third sale for fred but is on the same date as fred's second sale.

    Executing your solution produces three records, two for fred on 2001-01-02 and one for sally on 2001-01-04, but my aim is to get one record for fred and one for sally. This is why I wanted to be able to fall back to the highest id. I should have said the highest sales.date then sales.id.

    Thanks again, Rudy, I'll see if I can work out what I need now that you've got me on the right track.

    Stacey.

  9. #9
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    multiple sales on the same date would not be a "problem" if your date column actually had a time component, obviously you must be stripping off the time component when adding a sales record -- i wouldn't do that, but perhaps you cannot change the application design now

    in any case, relying on an incrementing id to distinguish the "latest" within a group of sales all on the same day is not guaranteed to work at all times (e.g. what if sales are added in batches -- admittedly a contrived example, but it does happen in some applications -- or what if earlier sales are later added after later ones)

    nevertheless if you want it to pick the highest id within a group of sales all with the same date, that's possible, let me know and i'll try to work up the solution when i get back home from work


    rudy

  10. #10
    Join Date
    Aug 2003
    Posts
    32
    Originally posted by DoktorBlue
    This should do the same a Rudy's code; however Stacey, be aware that using the date may result in multiple records if you have the same date multiple times for the same person.
    Yeap, that's why I wanted to get the highest sales.date then sales.id to determine the one sales record to join to each sales_person record.

    Thanks for the updated query DoktorBlue. I'll work through it see if I can eliminate those multiple rows for the same person.

    Stacey.

  11. #11
    Join Date
    Aug 2003
    Posts
    32
    Originally posted by r937
    multiple sales on the same date would not be a "problem" if your date column actually had a time component, obviously you must be stripping off the time component when adding a sales record -- i wouldn't do that, but perhaps you cannot change the application design now

    in any case, relying on an incrementing id to distinguish the "latest" within a group of sales all on the same day is not guaranteed to work at all times (e.g. what if sales are added in batches -- admittedly a contrived example, but it does happen in some applications -- or what if earlier sales are later added after later ones)

    nevertheless if you want it to pick the highest id within a group of sales all with the same date, that's possible, let me know and i'll try to work up the solution when i get back home from work


    rudy
    Thanks for your comments.

    The actual application does store the time with the date, but when the data is sent to the MS SQL database from multiple branch Paradox databases (in batches) the time gets rounded to seconds. As a result there are in fact multiple child records with exactly the same date and time. In a situation where this occurs, it has been deemed acceptable to select the record with the highest id.

    I would appreciate a working solution, thank you.

    Stacey.

  12. #12
    Join Date
    Aug 2003
    Posts
    32
    Thanks Rudy and DoktorBlue for the time you put into this. I think I've managed to get what I wanted! I made another sample database that more closely resembles what I actually need in the target application and with a few minor addtions to your contributions.. hey presto!

    use master

    create database sample

    use sample

    create table product (
    product_id int,
    product_name varchar(50)
    )

    insert into product values (1, 'product one')
    insert into product values (2, 'product two')

    create table branch (
    branch_id int,
    branch_name varchar(50)
    )

    insert into branch values (1, 'branch one')
    insert into branch values (2, 'branch two')

    create table trans (
    trans_id int,
    entry_date datetime,
    branch_id int,
    product_id int,
    quantity int,
    sale money,
    cost_of_sale money
    )

    insert into trans values (1, '2001-01-01', 1, 1, 5, 50, 20)
    insert into trans values (2, '2001-01-02', 1, 2, 6, 45, 38)
    insert into trans values (3, '2001-01-02', 1, 1, 3, 29, 12)
    insert into trans values (4, '2001-01-01', 2, 2, 12, 98, 34)
    insert into trans values (5, '2001-01-02', 2, 2, 10, 70, 65)
    insert into trans values (6, '2001-02-01', 1, 2, 34, 182, 96)

    select
    t1.trans_id,
    t1.entry_date,
    b.branch_name,
    p.product_name,
    t1.quantity,
    t1.sale,
    t1.cost_of_sale
    from
    trans t1 inner join
    branch b on t1.branch_id = b.branch_id inner join
    product p on t1.product_id = p.product_id
    where
    t1.trans_id = (
    select
    max(t2.trans_id)
    from
    trans t2
    where
    t2.entry_date = (
    select
    max(t3.entry_date)
    from
    trans t3
    where
    t2.product_id = t3.product_id
    ) and t1.product_id = t2.product_id
    )


    Thanks again to you both.

    Stacey.

  13. #13
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    hey, a doubly nested correlated subquery

    neat

    that is well beyond "sql 101" level


  14. #14
    Join Date
    Aug 2003
    Location
    Delft, The Netherlands (EU)
    Posts
    447
    Originally posted by r937
    hey, a doubly nested correlated subquery

    neat

    that is well beyond "sql 101" level


    ..... but slow! For just some records, it isn't a problem. For much records (> 100.000), you will experience the difference between Rudy's execution plan, and mine. Rudy's subquery will be executed for every sales record, while my subquery determines just the needed sales records first (in one execution), and will be used to filter just the required sales records.

    If you don't believe me, Stacey, and if you have a considerable table extents, I'd suggest to try both variants. Both should return the same result, but my solution should work in a fraction of time.

    In order to work optimal, I strongly prefer to select the highest sales ID. It's unique (no multiple records), it has an unique index (great for the performance), and it is small. The only question is whether the ID is really correlated with the time, and if not, I'm wondering whether it really matters.
    Make everything as simple as possible, but not simpler! - A. Einstein
    DB Problems? DB Explorer, BTrieve Re-engineering, DB Conversions & ETL? Conversion Tool

  15. #15
    Join Date
    Aug 2003
    Posts
    32
    Originally posted by DoktorBlue
    ..... but slow! For just some records, it isn't a problem. For much records (> 100.000), you will experience the difference between Rudy's execution plan, and mine. Rudy's subquery will be executed for every sales record, while my subquery determines just the needed sales records first (in one execution), and will be used to filter just the required sales records.

    If you don't believe me, Stacey, and if you have a considerable table extents, I'd suggest to try both variants. Both should return the same result, but my solution should work in a fraction of time.

    In order to work optimal, I strongly prefer to select the highest sales ID. It's unique (no multiple records), it has an unique index (great for the performance), and it is small. The only question is whether the ID is really correlated with the time, and if not, I'm wondering whether it really matters.
    I tidied up the code to work with our application today. I needed to get the last cost price of every product for every month sold during the year for a group of branches (required a few extra items in the where part), in addition I needed to find the last cost price of every product for every month sold by a comparison branch. I think there's over a million records that I needed to scan for that year (I'd have to check tommorrow when I'm back at work to be sure) and the execution time was about two and a half minutes. That's a little faster than what my original stored producdure was producing and the code is now a lot easier to understand and maintain (I was originally using a cursor, selecting in to variables, etc). I will expand on your example tomorrow for my specific requirements, and let you know how much faster it is if you like. Because data is added to the database in batches branched "dial in" and upload their data) it's not feasable to assume that the order of IDs is the same as the order of dates for the group of branches.

    Thanks for your comments,

    Stacey.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •