Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Perentile

Author  Topic 

Nikhil341
Starting Member

9 Posts

Posted - 2012-01-11 : 16:09:35
Hi,
Can you please help me how to write the percentile function in SQL?

I have like 8 Web transactions and the response time for each one is 1,3,0.5,4,0.7,4,1 sec's.

Now i need to calculate 85th percentile of the response time.

Thanks.

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-01-12 : 09:41:41
I was trying to do this using the NTILE function (available in SQL 2005 and above), but I was unsuccessful. Maybe someone who is more familiar with it would be able to suggest a simple way of using that function.

In the meantime, here is my long way of calculating the percentile. This would differ from what you would see if you were to do the same in Excel (and my humble opinion is that this, and not Excel is the correct way of doing it). For an interesting read on the nuances, see here: http://www.coventry.ac.uk/ec/~nhunt/pottel.pdf

So my long and convoluted way of calculating the percentile:
CREATE TABLE #tmp(response_time DECIMAL(18,4));
INSERT INTO #tmp VALUES (1),(3),(0.5),(4),(0.7),(4),(1);


DECLARE @p DECIMAL(18,4) = 0.85;
DECLARE @pN1 DECIMAL(18,4), @k1 DECIMAL(18,4), @kN DECIMAL(18,4);

SELECT
@pN1 = @p*(COUNT(*)+1),
@k1 = MIN(response_time),
@kn = MAX(response_time)
FROM
#tmp;

WITH cte AS
(
SELECT *,ROW_NUMBER() OVER (ORDER BY response_time) AS RN
FROM #tmp
)
SELECT
ISNULL(MAX(CASE WHEN RN <= @pN1 THEN response_time END),@k1)
+
(
ISNULL(MIN(CASE WHEN RN > @pN1 THEN response_time END),@kn)
-
ISNULL(MAX(CASE WHEN RN <= @pN1 THEN response_time END),@k1)
)*(@pn1-FLOOR(@pn1))
FROM
cte;
Go to Top of Page

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2012-01-12 : 10:46:44
I suspect peso will have an algorithm somewhere for this. He always does.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION
Go to Top of Page

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2012-01-12 : 10:49:35
In fact, as you calculate the percentile based only on the preceding rows, I *know* he has a method that can be applied.
However, I still haven't quite digested it to the point where I can use it.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-01-12 : 11:02:54
I saw different approaches to percentile calculations discussed in this posting: http://stackoverflow.com/questions/79688/calculating-percentile-rankings-in-ms-sql But I am not sure if they work accurately for small sample sizes.

Where is Peso when we REALLY need him!?!
Go to Top of Page

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2012-01-12 : 11:11:17
he is here:
@SwePeso N 56°04'39.26 E 12°55'05.63

according to his twitter feed.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION
Go to Top of Page

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2012-01-12 : 12:09:58
Here's a sample dataset which we could play with:

IF OBJECT_ID('tempdb..#PercentileData') IS NOT NULL DROP TABLE #PercentileData
GO

CREATE TABLE #PercentileData (
[SeriesID] INT
, [Value] DECIMAL(9,2)
)
GO

-- Populate PercentileData 65536 row(s)
; WITH series ([n])
AS
(
SELECT ROW_NUMBER() OVER ( ORDER BY a.[n] )
FROM
(VALUES (1),(1),(1), (1)) AS a (n)
CROSS JOIN (VALUES (1),(1),(1), (1)) AS b (n)
CROSS JOIN (VALUES (1),(1),(1), (1)) AS c (n)
CROSS JOIN (VALUES (1),(1),(1), (1)) AS d (n)
)
INSERT #PercentileData ([SeriesID], [Value])
SELECT
s.[n]
, vals.[val]
FROM
series AS s
CROSS APPLY (
SELECT
ABS(CAST(HASHBYTES('MD2', CAST(s.[n] AS VARBINARY(16))) AS TINYINT) / CAST(s.[n] AS FLOAT) * (n.[n]))
FROM
series AS n
)
AS vals ([val])

SELECT * FROM #PercentileData


Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2012-01-12 : 12:20:30
I got a different answer so it is probably wrong)
this is using sunitabeck's example table with the OP's data:

CREATE TABLE #tmp(response_time DECIMAL(18,4));
INSERT INTO #tmp VALUES (1),(3),(0.5),(4),(0.7),(4),(1);

select max(response_time)
from (--assign an order to all the response times
select response_time, ROW_NUMBER() over (order by response_time) rn
from #tmp
) d
where rn <
(--find the first row number below which 85% of the values fall)
select case
when ceiling(COUNT(*) * .85) = COUNT(*) * .85 then ceiling(COUNT(*) *.85) + 1
else CEILING(count(*) * .85)
end
from #tmp
)

OUTPUT:
3.0000


Be One with the Optimizer
TG
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-01-12 : 13:15:43
The differences in our calculations may be due to:

a) the choice of whether to report the value from the samples that is closest to the percentile or whether to interpolate

b) the small sample size.

If you look at the samples and the distribution, it looks like this:
response_time	Rank	PercentileLocation
0.5000 1 0.000000000000
0.7000 2 16.666666666666
1.0000 3 33.333333333333
1.0000 4 50.000000000000
3.0000 5 66.666666666666
4.0000 6 83.333333333333
4.0000 7 100.000000000000
So it would seem like the 85th percentile is 4.0 whether or not you interpolate.

-- code used to generate the data above
CREATE TABLE #tmp(response_time DECIMAL(18,4));
INSERT INTO #tmp VALUES (1),(3),(0.5),(4),(0.7),(4),(1);

SELECT
response_time,
ROW_NUMBER() OVER(ORDER BY response_time) AS [Rank],
100.0*(ROW_NUMBER() OVER(ORDER BY response_time)-1)/(COUNT(*) OVER()-1) AS PercentileLocation
FROM
#tmp;

DROP TABLE #tmp;

Go to Top of Page
   

- Advertisement -