# Thread: SQL server, alculating median values on a column in a table

1. Registered User
Join Date
Feb 2012
Posts
6

## Unanswered: SQL server, alculating median values on a column in a table

Hi there,

I need to calculate a median on a column in a table.

The code I have is:
Code:
``` Select gender,
CASE
when gender = 'F' then 'Female'
when gender = 'M' then 'Male'
else 'Unknown'
end as test,
datediff(day, [admit_date], getdate()) as 'datediffcal',
from [tbl_record]```
How do I calculate the median on the datediffcal column?

It doesn't matter if the resultset only shows the median result. So if the output shows:

median
15

that's fine. Minimally, I need the median value.

Is anybody able to help me on this?

Thank you

2. Registered User
Join Date
Nov 2004
Posts
1,428
Try this:
Code:
```SELECT R.gender,
AVG(1.0 * datediff(day, R.admit_date, getdate())) AS medianValue
FROM (SELECT gender,
ROW_NUMBER() OVER(PARTITION BY gender ORDER BY admit_date) AS rowno
FROM #DaTable
) AS R
INNER JOIN (SELECT gender,
1 + count(*) as N
FROM #DaTable
GROUP BY gender
) G ON
R.gender = G.gender AND
R.rowNo BETWEEN N/2 AND N/2+N%2
GROUP BY R.gender```

3. Registered User
Join Date
Feb 2012
Posts
6

## regarding the median

Hi,

Thank you.

Because I am relatively new to this, would you mind explaining to me what the
Code:
`ROW_NUMBER() OVER(PARTITION BY gender ORDER BY admit_date)`
do?

How does the average() function come into play here?

Can you explain why we need an inner join here?

And what does
Code:
`N/2 AND N/2+N%2`
really do? Is that % sign in there? Is that mod?

Sorry. I'm just really learning the basics of the basics....appreciate your time with me on this matter.

4. Registered User
Join Date
Nov 2004
Posts
1,428
How does the average() function come into play here?
I found this clear definition of Median. Read it, and read it again, because this is in essence what the query does.
Median is a statistical calculation that's commonly used for analytical purposes to determine the middle value(s) in a distribution. Median is calculated differently depending on whether the input set of values contains an odd or even number of elements. When you have an odd number of elements and you sort the values in chronological order, median is the middle value. For example, if you have this set of values {30, 10, 40} and you sort them in chronological order, the median is 30 (middle value of {10, 30, 40}). In an even number of elements (also sorted in chronological order), the median is the average of the two middle values. For example, given the set of values {30, 10, 40, 10}, the median is 20 (average of 10 and 30).
And what does

N/2 AND N/2+N%2

really do? Is that % sign in there? Is that mod?
Yes, % is MOD. N/2 AND N/2+N%2 are needed to get the middle value/two middle values.

Let's see what each part of the query does:
Code:
```(SELECT gender,
ROW_NUMBER() OVER(PARTITION BY gender ORDER BY admit_date) AS rowno
FROM #DaTable
) AS R```
First we divide the data of the table in groups, based on the gender. Per gender, we give each record a sequential number, starting from 1 till N (N = number of records of that specific gender). Per gender, we first order the records by admit_date before we assign a rownumber to them. That is what
ROW_NUMBER() OVER(PARTITION BY gender ORDER BY admit_date)
does: partition, divide, group, ... all the records by gender, and per gender value, order all the records by the admit_date and assign an incremental number to them, starting from 1.

Code:
```(SELECT gender,
1 + count(*) as N
FROM #DaTable
GROUP BY gender
) G```
Calculate N, the number of records per gender value.
- Add 1 to N, so when we have an odd number of records, we end up with an even result and
R.rowNo BETWEEN N/2 AND N/2+N%2
for an even number of N, like 10, will give us
R.rowNo BETWEEN 10/2 AND 10/2+0
=
R.rowNo BETWEEN 5 AND 5
returning exactly one record, the middle one.

- Add 1 to N, so when we have an even number of records, we end up with an odd result and
R.rowNo BETWEEN N/2 AND N/2+N%2
for an odd number of N, like 11, will give us
R.rowNo BETWEEN 11/2 AND 11/2 + 1
=
R.rowNo BETWEEN 5 AND 6
returning the two middle records.

Code:
```SELECT R.gender,
AVG(1.0 * datediff(day, R.admit_date, getdate())) AS medianValue
...
INNER JOIN (...
) G ON
R.gender = G.gender AND
R.rowNo BETWEEN N/2 AND N/2+N%2```
Here everything comes together: for a given value of the gender, get all the records (one or two, depending on the odd/even number of records) in the middle of the ordered set of records.
The median is the average value of the middle record(s).
- With one middle record, the average = the value of the middle record itself.
- With two middle records, the average of those two records.

I don't know if my explanation helped in any way or made it more confusing. Basically it just does what the definition of Median states.

5. Resident Curmudgeon
Join Date
Feb 2004
Location
In front of the computer
Posts
15,579
Purely as a side note, the Median function is provided in SQL Server 2012.

-PatP

6. Registered User
Join Date
Feb 2012
Posts
6

## regarding the median

Thank you for your detailed explanation Wim..I really appreciate your answer...so detailed.

Could you explain why we need to use an inner join to join the two sets?

median is the average value of the middle records. So does the INNER JOIN essentially join the two sets of middle records based on the equal rownum condition?

Will this condition always satisfy? I am trying to picture it in my head so:

Code:
`R.gender = G.gender`
Thanks

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•