Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Dec 2005
    Location
    Texas
    Posts
    100

    Unanswered: How to find the median?

    I've noticed that SQL Server (and other DBMSs I've looked at) doesn't seem to have a built-in function for finding the median of a range of numbers.

    Gack!

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    gack is right

    see this thread from way back in the archives
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Your required solution may not be quite as complicated as the one linked to. That particular poster needed the median of some not-yet-calculated-values so these values are calculated in derived tables.

    If you simply have a table with several values that you want to find the median of then the SQL is simpler.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Median does not have a set based solution. You have to iterate in order to find the median (order matters to compute a median). Engines that process a row at a time do medians easily, engines that process sets have a real problem with it.

    -PatP

  5. #5
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    okay, which is it? "does not have a set based solution" or "have a real problem with it"?

    what about this type of solution:
    Code:
    SELECT 
       CASE WHEN COUNT(*)%2=1 
            THEN x.Hours 
            ELSE (x.Hours+MIN(CASE WHEN y.Hours>x.Hours 
                                   THEN y.Hours 
                              END))/2.0 
       END median
    FROM BulbLife x, BulbLife y
    GROUP BY x.Hours
    HAVING 
       SUM(CASE WHEN y.Hours <= x.Hours 
          THEN 1 ELSE 0 END)>=(count(*)+1)/2 AND
       SUM(CASE WHEN y.Hours >= x.Hours 
          THEN 1 ELSE 0 END)>=(count(*)/2)+1
    
    -- from Transact-SQL Cookbook Chapter 8  Statistics in SQL
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Or:

    Code:
    SELECT A.TheValue AS TheMedian
    FROM (SELECT DISTINCT TheValue FROM MyTable) A, MyTable B
    GROUP BY A.TheValue 
    HAVING sum(SIGN(A.TheValue - B.TheValue )) IN (0, -1)
    Testimonial:
    pootle flump
    ur codings are working excelent.

  7. #7
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Beg your pardon - blummin' thing has failed.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    The Median depends on order in every case, and non-deterministic rows for data with an even number of elements. Because Median depends on element order, there can't be any set based solution, because sets have no order.

    I never said that you can't solve it with SQL, that's a very different claim. There are several ways to do that. You still can't solve it declaratively, and you can't solve it with sets.

    -PatP

  9. #9
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    sets may have no order, but order can still be imposed with predicates

    i ask you, are you familiar with the query to rank result rows? it's a theta self-join, and what it does is count the rows that are greater (or lesser) than, based on the value of a column, then adds 1 to get the rank

    same thing with partitioning a single set into two -- those values that are higher, and those values that are lower

    and those are sets

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by Pat Phelan
    there can't be any set based solution
    what do you call the solution in post #5, please?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    Provided Answers: 11
    On the surface, I would call it a performance problem ;-). How many rows in BulbLife? Never knew you could declare a "join" in the having clause. Or does the optimizer see it that way?

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by MCrowley
    On the surface, I would call it a performance problem ;-).
    okay, i see the smiley, so i won't respond the way i would've if it weren't there, which would've been something along the lines of "oh, so for challenging queries, you dump the data and do it in excel, eh?"

    btw, there is no join in the HAVING clause (if you are referring to post #5)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  13. #13
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by r937
    what do you call the solution in post #5, please?
    I'd call that a Transact-SQL solution.

    -PatP

  14. #14
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    And here I thought you came up with this on your own...

    Looks like a must have book though...
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  15. #15
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by Pat Phelan
    I'd call that a Transact-SQL solution.
    you are a slippery eel today, but squirm all you want, buddy, i think i gotcha

    what part of it makes it Transact-SQL? MIN? CASE? COUNT?

    looks to me like it's all SQL

    or are you referring to the fact that the solution doesn't use FLOOR?

    and what about my questions regarding sets?

    is that or is that not a set-based solution?

    eel, i say

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •