Results 1 to 11 of 11
  1. #1
    Join Date
    Feb 2005
    Posts
    20

    Unanswered: Calculate median of difference in days between records

    I have a table of sample data

    Samples(sample_no, sample_date..)

    I have no idea how to do the following in sql server or if its even possible:

    1. Calculate the difference in days between all samples.
    2. Select the median result

    Any trick to get this done would be really helpful

    thanks,

    DB

  2. #2
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    1. you can start by reading about datediff.

    2. median. that is the value in the physical middle, right?. I would create a temp tale using the IDENTITY function and a order by on the column in question and then do the math on the new identity value.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  3. #3
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    2. Create a query. Cross join table to itself. Deduct your field from itself (T1.Field1 - T2.Field2). Apply the Sign function to the result and sum it. The median is the one that comes up 0. If there are an even numbers of values then there will be two medians (1 and -1) so you need to decide if you want the higher or lower one.

    DATEDIFF will help you with the above too.

    HTH

    EDIT - the above allows you to use Groupings so you can get medians by group too.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  4. #4
    Join Date
    Feb 2005
    Posts
    20
    Is there any chance I could get an example in sql code of what you are describing... IM a little confused...

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Well, first, do you mean the difference in days between all samples, or just the difference between a sample and the immediately preceding sample? One, requires a cross-join, and the other does not, so the devil is in the details.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by pootle flump
    EDIT - the above allows you to use Groupings so you can get medians by group too.
    Oh yeah - but that would require a join by the grouping column(s). Forgot that little detail
    Testimonial:
    pootle flump
    ur codings are working excelent.

  7. #7
    Join Date
    Feb 2005
    Posts
    20
    here is some sample data:

    Sample_date sample_no
    4/29/1962 0:00 1962706086
    5/1/1962 0:00 1962704968
    5/3/1962 0:00 1962704891
    5/4/1962 0:00 1962722248
    5/15/1962 0:00 1962706641
    5/18/1962 0:00 1962722249
    5/18/1962 0:00 1962704969
    5/22/1962 0:00 1962706344
    6/5/1962 0:00 1962722690
    6/7/1962 0:00 1962722250
    6/8/1962 0:00 1962708068
    6/9/1962 0:00 1962722842
    6/12/1962 0:00 1962708069
    6/12/1962 0:00 1962722928
    6/13/1962 0:00 1962722691
    6/13/1962 0:00 1962706345
    6/13/1962 0:00 1962706427
    6/16/1962 0:00 1962706561
    6/16/1962 0:00 1962704970
    6/16/1962 0:00 1962704971
    6/16/1962 0:00 1962705794
    6/17/1962 0:00 1962708070
    6/17/1962 0:00 1962708071
    6/17/1962 0:00 1962722929
    6/17/1962 0:00 1962722930
    6/18/1962 0:00 1962722692
    6/18/1962 0:00 1962722693
    6/18/1962 0:00 1962722694
    6/18/1962 0:00 1962722695
    6/18/1962 0:00 1962722696
    6/18/1962 0:00 1962722697
    6/18/1962 0:00 1962722698
    6/18/1962 0:00 1962722843
    6/23/1962 0:00 1962706562
    6/23/1962 0:00 1962722251
    6/23/1962 0:00 1962722252
    6/29/1962 0:00 1962704892
    6/29/1962 0:00 1962704893
    7/1/1962 0:00 1962722931
    7/2/1962 0:00 1962708072
    7/2/1962 0:00 1962722844
    7/4/1962 0:00 1962708073
    7/6/1962 0:00 1962722932
    7/7/1962 0:00 1962705881
    7/7/1962 0:00 1962722253
    7/11/1962 0:00 1962722933
    7/15/1962 0:00 1962722254
    7/18/1962 0:00 1962708074
    7/18/1962 0:00 1962708075
    7/18/1962 0:00 1962722934
    7/22/1962 0:00 1962722255
    7/22/1962 0:00 1962704894
    7/25/1962 0:00 1962722935
    7/27/1962 0:00 1962708076
    7/27/1962 0:00 1962708077
    7/28/1962 0:00 1962704895
    7/28/1962 0:00 1962722256
    8/1/1962 0:00 1962708078
    8/1/1962 0:00 1962708079
    8/1/1962 0:00 1962722699
    8/3/1962 0:00 1962722936
    8/3/1962 0:00 1962722937

    When you look at the first 2 rows:
    4/29/1962 0:00 1962706086
    5/1/1962 0:00 1962704968

    you see there is a difference of 3 Days.

    When you look at row 2 and 3
    5/1/1962 0:00 1962704968
    5/3/1962 0:00 1962704891

    you see there is a difference of 2 Days.

    This difference can be from 1 - infinate days.

    calculate all the differences between all the consecutive days in this pattern then get the median value (the number in the MIDDLE of the set of given numbers)
    ie if you have 9 values
    1. 12
    2. 25
    3. 44
    4. 55
    5. 89
    6. 122
    7. 155
    8. 325
    9. 8000

    the median is 89. I dont want to have to calculate median manually like this, its just way to much processing although im not sure if its built into sql server.

    so for each station there should be a single value comming out and:
    if num between 1 - 7 then "Monthly"
    if num between 7 - 14 then "Bi-Weekly"
    if num between 14 - 30 then "Monthly"
    if num between 30 - 360 then "Yearly"

    ... or something similar

    To make matters worse this is only a small fraction of a larger "Station Level" query that will be including this query somehow. Groups of samples belong to different "Stations" which I have to seperate by. For example, all the data you see here is for 1 station. There are about 300 stations being returned in the higher level view that this query will belong to. For kicks I will include it:

    you can see that Sampling_Frequency is there as "UNKNOWN" cause i dont know how to do it. also if anyone wants to suggest how i optimize this query feel free to add suggestions:


    SELECT DISTINCT stat.station_number AS Site_ID, stat.station_number AS Site_Number, stat.station_name AS Site_Name, stat.station_description AS Site_Description,
    stat.creation_date AS Site_Activation_Date, dbo.station_coordinates.lat_dec_deg AS Latitude,
    dbo.station_coordinates.lat_dec_deg AS North_bounding_Latitude, dbo.station_coordinates.lat_dec_deg AS South_bounding_Latitude,
    dbo.station_coordinates.lon_dec_deg AS Longitude, dbo.station_coordinates.lon_dec_deg AS West_bounding_Longitude,
    dbo.station_coordinates.lon_dec_deg AS East_bounding_Longitude, 'N/A' AS Site_Photograph_link, 'N/A' AS Local_site_influences_link,
    'http://map.ns.ec.gc.ca/NA' AS Collection_level_metadata_link, 'http://map.ns.ec.gc.ca/envirodat/' AS Other_info_link,
    'Environment Canada, Atlantic Region, Environmental Conservation Branch' AS Organization_name, '5068512398' AS Contact_Phone_number,
    '5068516608' AS Contact_Fax_number, 'PO BOX 23005' AS Contact_street_address, 'Moncton' AS Contact_city, 'NB' AS Contact_province,
    'E1A 6S8' AS Contact_postal_code, 'cathy.cormier@ec.gc.ca' AS Contact_e_mail, 'http://map.ec.gc.ca/envirodat' AS Organization_web_site,
    '0800-1600' AS Contact_hours_of_service, 'N/A' AS Contact_other_instructions, dbo.projects.project_no, dbo.projects.project_name,

    (SELECT DISTINCT wshed
    FROM envguest.stations_watersheds
    WHERE station = stat.station_number) AS watershed,


    case active_indicator WHEN 'Y' Then NULL else (select max(sample_date) from samples where station_no = stat.Station_number) END as Site_Final_Sampling_Date,

    CASE active_indicator WHEN 'Y' THEN 'Active' ELSE 'Inactive' END AS Current_Status,

    'UNKNOWN' as Jurisdiction, 'UNKNOWN' as Sampling_Frequency


    FROM dbo.station_coordinates INNER JOIN
    dbo.station stat ON dbo.station_coordinates.station_number = stat.station_number INNER JOIN
    dbo.samples ON stat.station_number = dbo.samples.station_no INNER JOIN
    dbo.projects ON dbo.samples.project_no = dbo.projects.project_no

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Hi

    A few quickies:

    Seen this posted a few times and now it is my turn (ahem):

    Order has no meaning in a relational database.

    This means that the apparent order of any rows in a table is arbitary - they are in fact descrete elements.

    I presume that you want to order the dates in ascending order and get the difference in days between successive dates?

    Also (not being picky - just to understand):
    Quote Originally Posted by dbenoit64
    calculate all the differences between all the consecutive days in this pattern then get the median value (the number in the MIDDLE of the set of given numbers)
    ie if you have 9 values
    1. 12
    2. 25
    3. 44
    4. 55
    5. 89
    6. 122
    7. 155
    8. 325
    9. 8000
    If there were nine dates there should only be eight numbers returned right?
    Finally - I notice the above return is ordered ascending. Is this a requirement? You aren't looking for the difference in days between the two median dates but the median of the differences?

    BTW - are you a surfing French man?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  9. #9
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Lost the server for a bit so had a play. Below is based on assumption that my assumptions were correct

    Decided not to use Thrasys Identity column - however you would be forgiven for thinking I have coded this with obsucation in mind. Not true - I just wanted to create a single query solution
    Code:
    IF EXISTS(SELECT * FROM sysobjects O WHERE O.name = 'MedianT' AND O.xtype = 'U') BEGIN
        DROP TABLE MedianT
    End
     
    CREATE TABLE MedianT (TheDate DATETIME)
     
    INSERT INTO MedianT (TheDate)
    SELECT '4/29/1962' UNION
    SELECT '5/1/1962' UNION
    SELECT '5/3/1962' UNION
    SELECT '5/4/1962' UNION
    SELECT '5/8/1962' UNION
    SELECT '5/10/1962' UNION
    SELECT '5/13/1962' UNION
    SELECT '5/14/1962' UNION
    SELECT '5/29/1962' UNION
    SELECT '5/29/1963' UNION
    SELECT '5/29/1967' UNION
    SELECT '5/29/1969' UNION
    SELECT '5/29/1974' UNION
    SELECT '5/29/1980' UNION
    SELECT '5/29/2000'
     
    SELECT Da.TheDifference AS TheMedian
    FROM (SELECT DISTINCT TheDifference FROM (SELECT DATEDIFF(D, A.TheDate, MIN(B.TheDate)) AS TheDifference
            FROM MedianT A, MedianT B
            WHERE A.TheDate < B.TheDate
            GROUP BY A.TheDate) Daa) Da,
            (SELECT DATEDIFF(D, A.TheDate, MIN(B.TheDate)) AS TheDifference
            FROM MedianT A, MedianT B
            WHERE A.TheDate < B.TheDate
            GROUP BY A.TheDate) Db
    GROUP BY Da.TheDifference
    HAVING sum(SIGN(Da.TheDifference - Db.TheDifference)) IN (0, -1)
    Testimonial:
    pootle flump
    ur codings are working excelent.

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    BTW - the IN(0, -1) bit means:
    If there are an odd number of vlaues then there is a natural median -> the result being 0. If there are an even number of values then there are two candidates for median. The -1 selects the lower of the two. Change to 1 if you want the higher.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  11. #11
    Join Date
    Feb 2005
    Posts
    20
    That looks really good. Hopefully it will work on a dataset of 10,000's of records. Havent had a chance to implement it yet but will today.
    BTW. Im not french, but a decendant of French west coast Newfoundlanders who jumped ship from france in the 1800's and I dont surf. hah

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •