Results 1 to 6 of 6
  1. #1
    Join Date
    Feb 2012
    Posts
    6

    Unanswered: SQL server, alculating median values on a column in a table

    Hi there,

    I need to calculate a median on a column in a table.

    The code I have is:
    Code:
     Select gender,
    CASE
    when gender = 'F' then 'Female'
    when gender = 'M' then 'Male'
    else 'Unknown'
    end as test,
    datediff(day, [admit_date], getdate()) as 'datediffcal',
    from [tbl_record]
    How do I calculate the median on the datediffcal column?

    It doesn't matter if the resultset only shows the median result. So if the output shows:

    median
    15

    that's fine. Minimally, I need the median value.

    Is anybody able to help me on this?

    Thank you

  2. #2
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Try this:
    Code:
    SELECT R.gender, 
    	AVG(1.0 * datediff(day, R.admit_date, getdate())) AS medianValue
    FROM (SELECT gender, 
    		admit_date,
    		ROW_NUMBER() OVER(PARTITION BY gender ORDER BY admit_date) AS rowno
    		FROM #DaTable
    	) AS R
    	INNER JOIN (SELECT gender, 
    		1 + count(*) as N
    		FROM #DaTable
    		GROUP BY gender
    		) G ON 
    		R.gender = G.gender AND 
    		R.rowNo BETWEEN N/2 AND N/2+N%2
    GROUP BY R.gender
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  3. #3
    Join Date
    Feb 2012
    Posts
    6

    regarding the median

    Hi,

    Thank you.

    Because I am relatively new to this, would you mind explaining to me what the
    Code:
    ROW_NUMBER() OVER(PARTITION BY gender ORDER BY admit_date)
    do?

    How does the average() function come into play here?

    Can you explain why we need an inner join here?

    And what does
    Code:
    N/2 AND N/2+N%2
    really do? Is that % sign in there? Is that mod?

    Sorry. I'm just really learning the basics of the basics....appreciate your time with me on this matter.

  4. #4
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    How does the average() function come into play here?
    I found this clear definition of Median. Read it, and read it again, because this is in essence what the query does.
    Median is a statistical calculation that's commonly used for analytical purposes to determine the middle value(s) in a distribution. Median is calculated differently depending on whether the input set of values contains an odd or even number of elements. When you have an odd number of elements and you sort the values in chronological order, median is the middle value. For example, if you have this set of values {30, 10, 40} and you sort them in chronological order, the median is 30 (middle value of {10, 30, 40}). In an even number of elements (also sorted in chronological order), the median is the average of the two middle values. For example, given the set of values {30, 10, 40, 10}, the median is 20 (average of 10 and 30).
    And what does

    N/2 AND N/2+N%2

    really do? Is that % sign in there? Is that mod?
    Yes, % is MOD. N/2 AND N/2+N%2 are needed to get the middle value/two middle values.

    Let's see what each part of the query does:
    Code:
    (SELECT gender, 
    	admit_date,
    	ROW_NUMBER() OVER(PARTITION BY gender ORDER BY admit_date) AS rowno
    	FROM #DaTable
    	) AS R
    First we divide the data of the table in groups, based on the gender. Per gender, we give each record a sequential number, starting from 1 till N (N = number of records of that specific gender). Per gender, we first order the records by admit_date before we assign a rownumber to them. That is what
    ROW_NUMBER() OVER(PARTITION BY gender ORDER BY admit_date)
    does: partition, divide, group, ... all the records by gender, and per gender value, order all the records by the admit_date and assign an incremental number to them, starting from 1.

    Code:
    (SELECT gender, 
    	1 + count(*) as N
    	FROM #DaTable
    	GROUP BY gender
    	) G
    Calculate N, the number of records per gender value.
    - Add 1 to N, so when we have an odd number of records, we end up with an even result and
    R.rowNo BETWEEN N/2 AND N/2+N%2
    for an even number of N, like 10, will give us
    R.rowNo BETWEEN 10/2 AND 10/2+0
    =
    R.rowNo BETWEEN 5 AND 5
    returning exactly one record, the middle one.

    - Add 1 to N, so when we have an even number of records, we end up with an odd result and
    R.rowNo BETWEEN N/2 AND N/2+N%2
    for an odd number of N, like 11, will give us
    R.rowNo BETWEEN 11/2 AND 11/2 + 1
    =
    R.rowNo BETWEEN 5 AND 6
    returning the two middle records.

    Code:
    SELECT R.gender,
    	AVG(1.0 * datediff(day, R.admit_date, getdate())) AS medianValue
    ...
    INNER JOIN (...
    	) G ON 
    	R.gender = G.gender AND 
    	R.rowNo BETWEEN N/2 AND N/2+N%2
    Here everything comes together: for a given value of the gender, get all the records (one or two, depending on the odd/even number of records) in the middle of the ordered set of records.
    The median is the average value of the middle record(s).
    - With one middle record, the average = the value of the middle record itself.
    - With two middle records, the average of those two records.

    I don't know if my explanation helped in any way or made it more confusing. Basically it just does what the definition of Median states.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Purely as a side note, the Median function is provided in SQL Server 2012.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  6. #6
    Join Date
    Feb 2012
    Posts
    6

    regarding the median

    Thank you for your detailed explanation Wim..I really appreciate your answer...so detailed.

    Could you explain why we need to use an inner join to join the two sets?

    median is the average value of the middle records. So does the INNER JOIN essentially join the two sets of middle records based on the equal rownum condition?

    Will this condition always satisfy? I am trying to picture it in my head so:

    Code:
    R.gender = G.gender
    Thanks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •