Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Finding modes by group

Author  Topic 

Aleph_0
Yak Posting Veteran

79 Posts

Posted - 2011-07-22 : 18:50:15
Hello, I want to know the best way to calculate modes when grouping within a field. Let's say I have this table:

emp_id amount
-----------------
4 12
4 12
4 14
7 10
7 10
7 11
7 14
7 14

I'd like the result to look like:

emp_id median
-----------------
4 12
7 10
7 14

Here's the code I came up with:

SELECT
a.emp_id
, a.amount
-- , a.freq
FROM
( --Order amounts from highest to lowest frequency
SELECT
t.emp_id
, t.amount
-- , [freq] = Count(*)
, [rowNum] = RANK() OVER (PARTITION BY t.emp_id ORDER BY count(*) DESC)
FROM MyTable t
GROUP BY
t.exp_group
, t.amount
-- ORDER BY
-- t.emp_id
-- , Count(*) DESC
) AS a
WHERE 1=1
AND a.rowNum = 1

Is there a better way to do this? Maybe one without a subquery? I tried to find a way using SELECT TOP 1 but I haven't figured out how to make it return the TOP 1 of each group.

Thanks!

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2011-07-23 : 11:35:25
>> Hello, I want to know the best way to calculate modes when grouping within a field. Let's say I have this table: <<

No, this is NOT a table! It has no key and can never have a key because of redundant duplicate rows. What you have is a deck of punch cards written in SQL. Your whole mindset is still back in punch card. I remember when we put the comma in front of each line and one line per card; this made re-arranging the deck easier. That was 1960; programmers do not do this in 2011.

Her is a valid table:

CREATE TABLE Foobar
(emp_id INTEGER NOT NULL,
foobar_amt INTEGER NOT NULL,
foobar_freq INTEGER NOT NULL,
PRIMARY KEY (emp_id, foobar_amt));


INSERT INTO Foobar
VALUES (4, 12, 2),
(4, 14, 1),
(7, 10, 2),
(7, 11, 1),
(7, 14, 2);

>> I'd like the result to look like: ..<<

SELECT F1.emp_id, F1.foobar_freq AS mode
FROM Foobar AS F1
WHERE F1.foobar_freq
= (SELECT MAX (F2.foobar_freq)
FROM Foobar AS F2
WHERE F1.emp_id = F2.emp_id);

When you get the DDL right, the DML is usually simple.

>> Here's the code I came up with: <<

Nope. Your first thought is to build a sequential file by sorting in SQL – the deck of punch cards again! Think in sets instead. Not only is this the wrong way to do SQL, the code was bad, too :)

Get a copy of THINKING IN SETS, it might help.

--CELKO--
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Go to Top of Page

Aleph_0
Yak Posting Veteran

79 Posts

Posted - 2011-07-25 : 12:47:23
quote:
Originally posted by jcelko
Your whole mindset is still back in punch card.

I have no idea what you're talking about. I'm not a programmer nor am I old. Maybe I should rephrase:

Let's say I have access to this "object" on a database that someone else set up and has existed before I was hired that, unfortunately for some people, actually has some duplicate, identical rows and there's nothing I can do about it. Let's also say that duplicates actually have meaning and thus cannot be removed (and I don't have priveleges to alter said "object").

quote:
SELECT F1.emp_id, F1.foobar_freq AS mode
FROM Foobar AS F1
WHERE F1.foobar_freq
= (SELECT MAX (F2.foobar_freq)
FROM Foobar AS F2
WHERE F1.emp_id = F2.emp_id);

I had no idea you could do that! I thought subqueries had to be self-contained, independent queries. Thanks for the eye-opener.

quote:
Not only is this the wrong way to do SQL, the code was bad, too :)

Please stop assuming beginners are idiots and merit subhuman treatment. We're beginners, for crying out loud, and we're trying to learn. Ending with cute little smileys doesn't hide your arrogant, self-righteous elitism. Not only is this the wrong way to treat people, your attempt to hide it was bad, too :) You see what I did there?

That being said, I actually do want to learn from you and anyone else who will offer their time and advice in a decent and respectful manner (you know, like mature adults). But please, take a vacation or something before answering any more posts from beginners. Your replies are sometimes so thick with condescension it's hard to find anything informative.
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-07-25 : 13:12:31
Just take it with a grain of salt (or bucket if necessary)... I don't think he's going to change.

I'm not sure if you got your question answered or not...

It seems that the code you came up with should function fine, so why the interest in changing it? Is it not performing well? Or does it just not look pretty?

You can look at exec times and scan counts by using:

Set Statistics IO On
Set Statistics time On

...code...

Set Statistics time Off
Set Statistics IO Off





Corey

I Has Returned!!
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-07-25 : 13:24:49
quote:
Originally posted by jcelko

...

Her is a valid table:

CREATE TABLE Foobar
(emp_id INTEGER NOT NULL,
foobar_amt INTEGER NOT NULL,
foobar_freq INTEGER NOT NULL,
PRIMARY KEY (emp_id, foobar_amt));


INSERT INTO Foobar
VALUES (4, 12, 2),
(4, 14, 1),
(7, 10, 2),
(7, 11, 1),
(7, 14, 2);

>> I'd like the result to look like: ..<<

SELECT F1.emp_id, F1.foobar_freq AS mode
FROM Foobar AS F1
WHERE F1.foobar_freq
= (SELECT MAX (F2.foobar_freq)
FROM Foobar AS F2
WHERE F1.emp_id = F2.emp_id);

When you get the DDL right, the DML is usually simple.

...




This is the most retard thing I have seen you post... You modified the OP's source data format and then provided an 'easy' solution to get the desired results. Of course it's easy... if you cheat.

Why would you assume that the OP has control over the source? Maybe the Db is not of his creation, and he is stuck working with it (very common). Even if this is his design and it is grossly incomplete, you still didn't answer his question. It's like someone asking for directions, and only giving them the 2nd half of the trip... why bother answering in the first place?

I don't have any issue with you explaining why a particular design isn't a good idea, or even why the nomenclature is inaccurate... but your attitude is just tiresome.

Find somewhere else to haunt.

Corey

I Has Returned!!
Go to Top of Page

Aleph_0
Yak Posting Veteran

79 Posts

Posted - 2011-07-25 : 13:50:13
quote:
Originally posted by Seventhnight

It seems that the code you came up with should function fine, so why the interest in changing it? Is it not performing well? Or does it just not look pretty?

You can look at exec times and scan counts by using:

Set Statistics IO On
Set Statistics time On

...code...

Set Statistics time Off
Set Statistics IO Off


Thanks, I didn't know about Statistics IO and time. The code I wrote does work fine, but it felt like the wrong approach. The math guy in me is always looking for the elegant solution....

And yeah, jcelko didn't really answer my question, but I did learn a new trick. It's not my database and I can't change a thing to it, so creating a new table is out of the question. Also, I've found out the hard way that having more than a couple levels of subqueries can slow things down a lot (I'm working with tables that have millions of rows each).

But thanks for the tip about exec times and scan counts!
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-07-25 : 13:54:43
You can also work to speed stuff up by using (and/or creating) proper indexes.

Table variables or temp tables are sometimes helpful to simplify a multi-step process. And, if you are in 2005 or greater, you can use CTEs as well.

It isn't simple to say which one to use when... but if something is performing badly, they are definitely options to investigate.

Corey

I Has Returned!!
Go to Top of Page
   

- Advertisement -