Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2002-12-07 : 14:58:00
|
I recently had to use my own little median technique again on a report here at work, and had posted it before, but wasn't sure if anyone had seen it. I have read Celko's and others techniques for generating a median and haven't seen one more efficent.Does anyone have a better way they can think of? I think this bad boy is pretty short & efficient. First, if you want to return the middle number or the higher one next to the middle if there is an even number:SELECT x.Value AS median FROM Vals xCROSS JOIN Vals y GROUP BY x.Value HAVING SUM(SIGN(x.Value-y.Value)) IN (1,0) Change the " IN (1,0)" to "IN (-1,0)" to get the lower value if there is an even # of values. Basically, we are saying compare each number to all possible numbers, and add up values of 1,0 or -1 depending if the first number is less, equal or higher than the second. The number that returns 0 is right in the middle ... If there is no middle, a -1 or 1 is returned. There will never be a 0 and (-1 or 1) at the same time returned. To get the financial median (avg of the 2 values middle values if there is an even number), you need to encapsulate the results of the above into a subquery, allow for not just (-1,0) but all three (-1,0,1) and then take the AVG of the values returned. That is, SELECT Avg(Median) as Median FROM ( SELECT x.Value AS median FROM Vals xCROSS JOIN Vals y GROUP BY x.Value HAVING SUM(SIGN(x.Value-y.Value)) IN (1,0,-1) ) AIf there is an even number of values, the lower and higher middle ones are averaged. If there is an odd number, only the middle value is returned and averaged (which of course has no effect).Most other techniques used several COUNT(*) subqueries which this one avoids.Critique and enjoy!- Jeff |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2002-12-08 : 09:20:43
|
Hi Jeff,Way cool. I can see other applications to find the Nth item in a list (SUM(SIGN) = N) , or every other item (SUM(SIGN) EVEN OR ODD). I don't understand the requirement (or application) for an average that's composed of the average of the two middle values. Why not the more common average of all values? Maybe I misunderstood the result? If you have a good reason, I'd like to hear it.Sam |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-12-08 : 09:30:27
|
That's the definition of the financial median. For an even number of rows, the financial median is the average of the two middle values. The statistical median is either middle value, depending on which one you decide to choose, but it has to be an actual value in the set. |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2002-12-08 : 15:07:11
|
Yep, those are the reasons!We always use the financial median (avg of two middle ones) here at work.Oh, and I almost forgot, the nicest thing about my technique is that it's very easy to get a median for groups of records:SELECT Company, Avg(Median) as Median FROM ( SELECT x.Company, x.Value AS median FROM Vals x INNER JOIN Vals y ON x.Company = y.CompanyGROUP BY x.Company, x.Value HAVING SUM(SIGN(x.Value-y.Value)) IN (1,0,-1) ) A The other techniques make doing this somewhat complicated.- JeffEdited by - jsmith8858 on 12/08/2002 15:07:59 |
|
|
Andraax
Aged Yak Warrior
790 Posts |
Posted - 2002-12-08 : 17:00:21
|
Nice one Jeff! Very useful! |
|
|
Frank Kalis
Constraint Violating Yak Guru
413 Posts |
Posted - 2004-03-05 : 17:24:58
|
Surely a little bit late to reply, but here it goes.There are different definitions of the words 'financial' and 'statistical' median. While one folks argue that in both cases it is the mean of the two middle values, some other folks say that it has to be a concrete value not an abstract. Those folks take always the LOWER of the both middle value. However, which way you go isn't important at all. With a little help from the proprietary TOP clause both can be very easily calculated.Here is the version that takes the mean of both middle values:IF OBJECT_ID('median') IS NOT NULL DROP TABLE medianGOCREATE TABLE median(col1 INT)GO INSERT INTO median (col1) VALUES (1)INSERT INTO median (col1) VALUES (2)INSERT INTO median (col1) VALUES (3)INSERT INTO median (col1) VALUES (3)INSERT INTO median (col1) VALUES (4)INSERT INTO median (col1) VALUES (7)INSERT INTO median (col1) VALUES (8)INSERT INTO median (col1) VALUES (9)SELECT ((SELECT MIN(Col1) FROM (SELECT TOP 50 PERCENT Col1 FROM Median ORDER BY Col1 DESC) a)+(SELECT MAX(Col1) FROM (SELECT TOP 50 PERCENT Col1 FROM Median ORDER BY Col1) a))/2. AS MedianMedian ------------------ 3.500000(1 row(s) affected) And here's the version where the lower of both middle values is taken:SELECT MAX(Col1) AS MedianFROM(SELECT TOP 50 PERCENT Col1 FROM Median ORDER BY Col1) aMedian ----------- 3(1 row(s) affected) While the median as a measure of central tendency is not so teribbly useful at all, this code should also work with quartiles or whatever fancy there is. Both methods work with even and odd numbers of rows.--Frank |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-03-10 : 14:06:19
|
Wow, just saw this...sure I'll run in to that now that we're converting everything to OFA...It's going to be painful...Thanks for the code Jeff...Book mark it!Brett8-) |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-07-19 : 16:05:48
|
For SQL Server 2005, use thisasc desc diff 1 4 3 -- even number of datapoints2 3 13 2 -14 1 -31 3 2 -- odd number of datapoints2 2 03 1 -2select avg(val) as median from(select row_number() over (order by val asc) as sort_asc , row_number() over (order by val desc) as sort_desc , val from table1) as t where sort_asc - sort_desc between -1 and 1 Peter LarssonHelsingborg, Sweden |
|
|
|
|
|
|
|