Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Subsets(Mean, Median, Mode)

Author  Topic 

GhantaBro
Posting Yak Master

215 Posts

Posted - 2011-03-09 : 12:52:51
Guys I need to break a data-set into around 900 subsets and then calculate mean, median, mode for each set... I also need to give a frequency distribution for each subsets... Is there a way to do through t-sql?

Here is example of my data:

ProductCode RefCode Amount
---------------------------
abc 143 59000.89
aaa 277 7000
abc 143 50000.23
abc 333 25000

So in my data if I group by ProductCode and Refcode, I have around 900 subsets... Now for each of those subsets I have calculate mean, median, mode and have frequency distribution for each subsets separately.. I am will using Amount for the calculations...

In the example above abc 143 is one subset, aaa 277 is another and abc 333 is third subset... But data will have several records on each subsets... Please let me know if I do not make much sense here... Thanks in advance guys for the inputs.

MIK_2008
Master Smack Fu Yak Hacker

1054 Posts

Posted - 2011-03-09 : 13:37:54
at least i dont remember now as how to calculate Mean, Median and Mode. Last time i studied them was quite long ago. Any how the following will at-least put you onto the right direction

Select ProductCode,RefCode,<Mean Calculation> as mean, <Median Calculation> as median, <Mode Calculation> as Mode
From TableName
<Where Condition if need to restrict the data set any way>
<Group by ProductCode,RefCode> -- in case if the calculations are using group functions e.g. avg, sum etc

However if you come up with at least one sample dataset of your data as mentioned above and the desired output as per sample dataset, then someone or I might help you in constructing the query to extract the desired output. (I would also require the details/formula for Mean, Median and Mode....)

Cheers
MIK
Go to Top of Page

GhantaBro
Posting Yak Master

215 Posts

Posted - 2011-03-09 : 14:20:03
quote:
Originally posted by MIK_2008

at least i dont remember now as how to calculate Mean, Median and Mode. Last time i studied them was quite long ago. Any how the following will at-least put you onto the right direction

Select ProductCode,RefCode,<Mean Calculation> as mean, <Median Calculation> as median, <Mode Calculation> as Mode
From TableName
<Where Condition if need to restrict the data set any way>
<Group by ProductCode,RefCode> -- in case if the calculations are using group functions e.g. avg, sum etc

However if you come up with at least one sample dataset of your data as mentioned above and the desired output as per sample dataset, then someone or I might help you in constructing the query to extract the desired output. (I would also require the details/formula for Mean, Median and Mode....)

Cheers
MIK




Thanks Mik we can use your approach only for Mean which we can do with AVG function in SQL.
Go to Top of Page

GhantaBro
Posting Yak Master

215 Posts

Posted - 2011-03-09 : 15:21:23
This does it...

-- Test Data
CREATE TABLE #T
(Prod CHAR(3),
Ref INT,
Amount FLOAT) ;

INSERT INTO #T
(Prod, Ref, Amount)
VALUES ('abc', 143, 59000.89),
('aaa', 277, 7000),
('abc', 143, 50000.23),
('abc', 333, 25000) ;

-- Mean

SELECT Prod,
Ref,
AVG(Amount) AS AvgAmount
FROM #T
GROUP BY Prod,
Ref ;

-- Median

SELECT DISTINCT
Prod,
Ref,
(TopHalfAmount + BottomHalfAmount) / 2
FROM #T AS T1
CROSS APPLY (SELECT TOP 1
Amount AS TopHalfAmount
FROM (SELECT TOP 50 PERCENT
Amount
FROM #T AS T2
WHERE T2.Prod = T1.Prod
AND T2.Ref = T1.Ref
ORDER BY Amount) AS TopHalfSub
ORDER BY Amount DESC) TopHalf
CROSS APPLY (SELECT TOP 1
Amount AS BottomHalfAmount
FROM (SELECT TOP 50 PERCENT
Amount
FROM #T AS T3
WHERE T3.Prod = T1.Prod
AND T3.Ref = T1.Ref
ORDER BY Amount DESC) AS BottomHalfSub
ORDER BY Amount) BottomHalf

-- Mode

SELECT DISTINCT
Prod,
Ref,
ModeAmount
FROM #T AS T1
CROSS APPLY (SELECT TOP 1
Amount AS ModeAmount
FROM #T AS T2
WHERE T2.Prod = T1.Prod
AND T2.Ref = T1.Ref
GROUP BY Amount
ORDER BY COUNT(*) DESC) AS ModeSub ;


Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-03-09 : 16:52:41
Easier Median here http://weblogs.sqlteam.com/peterl/archive/2009/09/16/Median-and-weighted-median.aspx



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page
   

- Advertisement -