Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Aleph_0
Yak Posting Veteran
79 Posts |
Posted - 2011-07-22 : 18:50:15
|
Hello, I want to know the best way to calculate modes when grouping within a field. Let's say I have this table:emp_id amount-----------------4 124 124 147 107 107 117 147 14 I'd like the result to look like:emp_id median-----------------4 127 107 14 Here's the code I came up with:SELECT a.emp_id , a.amount-- , a.freqFROM( --Order amounts from highest to lowest frequency SELECT t.emp_id , t.amount-- , [freq] = Count(*) , [rowNum] = RANK() OVER (PARTITION BY t.emp_id ORDER BY count(*) DESC) FROM MyTable t GROUP BY t.exp_group , t.amount-- ORDER BY-- t.emp_id-- , Count(*) DESC) AS aWHERE 1=1 AND a.rowNum = 1 Is there a better way to do this? Maybe one without a subquery? I tried to find a way using SELECT TOP 1 but I haven't figured out how to make it return the TOP 1 of each group.Thanks! |
|
|
jcelko
Esteemed SQL Purist
547 Posts |
Posted - 2011-07-23 : 11:35:25
|
| >> Hello, I want to know the best way to calculate modes when grouping within a field. Let's say I have this table: <<No, this is NOT a table! It has no key and can never have a key because of redundant duplicate rows. What you have is a deck of punch cards written in SQL. Your whole mindset is still back in punch card. I remember when we put the comma in front of each line and one line per card; this made re-arranging the deck easier. That was 1960; programmers do not do this in 2011. Her is a valid table: CREATE TABLE Foobar(emp_id INTEGER NOT NULL, foobar_amt INTEGER NOT NULL, foobar_freq INTEGER NOT NULL, PRIMARY KEY (emp_id, foobar_amt));INSERT INTO FoobarVALUES (4, 12, 2), (4, 14, 1), (7, 10, 2), (7, 11, 1), (7, 14, 2);>> I'd like the result to look like: ..<<SELECT F1.emp_id, F1.foobar_freq AS mode FROM Foobar AS F1 WHERE F1.foobar_freq = (SELECT MAX (F2.foobar_freq) FROM Foobar AS F2 WHERE F1.emp_id = F2.emp_id);When you get the DDL right, the DML is usually simple. >> Here's the code I came up with: <<Nope. Your first thought is to build a sequential file by sorting in SQL – the deck of punch cards again! Think in sets instead. Not only is this the wrong way to do SQL, the code was bad, too :) Get a copy of THINKING IN SETS, it might help.--CELKO--Books in Celko Series for Morgan-Kaufmann PublishingAnalytics and OLAP in SQLData and Databases: Concepts in Practice Data, Measurements and Standards in SQLSQL for SmartiesSQL Programming Style SQL Puzzles and Answers Thinking in SetsTrees and Hierarchies in SQL |
 |
|
|
Aleph_0
Yak Posting Veteran
79 Posts |
Posted - 2011-07-25 : 12:47:23
|
quote: Originally posted by jcelkoYour whole mindset is still back in punch card.
I have no idea what you're talking about. I'm not a programmer nor am I old. Maybe I should rephrase:Let's say I have access to this "object" on a database that someone else set up and has existed before I was hired that, unfortunately for some people, actually has some duplicate, identical rows and there's nothing I can do about it. Let's also say that duplicates actually have meaning and thus cannot be removed (and I don't have priveleges to alter said "object").quote: SELECT F1.emp_id, F1.foobar_freq AS modeFROM Foobar AS F1WHERE F1.foobar_freq = (SELECT MAX (F2.foobar_freq)FROM Foobar AS F2WHERE F1.emp_id = F2.emp_id);
I had no idea you could do that! I thought subqueries had to be self-contained, independent queries. Thanks for the eye-opener.quote: Not only is this the wrong way to do SQL, the code was bad, too :)
Please stop assuming beginners are idiots and merit subhuman treatment. We're beginners, for crying out loud, and we're trying to learn. Ending with cute little smileys doesn't hide your arrogant, self-righteous elitism. Not only is this the wrong way to treat people, your attempt to hide it was bad, too :) You see what I did there?That being said, I actually do want to learn from you and anyone else who will offer their time and advice in a decent and respectful manner (you know, like mature adults). But please, take a vacation or something before answering any more posts from beginners. Your replies are sometimes so thick with condescension it's hard to find anything informative. |
 |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2011-07-25 : 13:12:31
|
Just take it with a grain of salt (or bucket if necessary)... I don't think he's going to change.I'm not sure if you got your question answered or not... It seems that the code you came up with should function fine, so why the interest in changing it? Is it not performing well? Or does it just not look pretty?You can look at exec times and scan counts by using:Set Statistics IO On Set Statistics time On...code...Set Statistics time OffSet Statistics IO OffCorey I Has Returned!! |
 |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2011-07-25 : 13:24:49
|
quote: Originally posted by jcelko...Her is a valid table: CREATE TABLE Foobar(emp_id INTEGER NOT NULL, foobar_amt INTEGER NOT NULL, foobar_freq INTEGER NOT NULL, PRIMARY KEY (emp_id, foobar_amt));INSERT INTO FoobarVALUES (4, 12, 2), (4, 14, 1), (7, 10, 2), (7, 11, 1), (7, 14, 2);>> I'd like the result to look like: ..<<SELECT F1.emp_id, F1.foobar_freq AS mode FROM Foobar AS F1 WHERE F1.foobar_freq = (SELECT MAX (F2.foobar_freq) FROM Foobar AS F2 WHERE F1.emp_id = F2.emp_id);When you get the DDL right, the DML is usually simple. ...
This is the most retard thing I have seen you post... You modified the OP's source data format and then provided an 'easy' solution to get the desired results. Of course it's easy... if you cheat.Why would you assume that the OP has control over the source? Maybe the Db is not of his creation, and he is stuck working with it (very common). Even if this is his design and it is grossly incomplete, you still didn't answer his question. It's like someone asking for directions, and only giving them the 2nd half of the trip... why bother answering in the first place?I don't have any issue with you explaining why a particular design isn't a good idea, or even why the nomenclature is inaccurate... but your attitude is just tiresome. Find somewhere else to haunt.Corey I Has Returned!! |
 |
|
|
Aleph_0
Yak Posting Veteran
79 Posts |
Posted - 2011-07-25 : 13:50:13
|
quote: Originally posted by Seventhnight It seems that the code you came up with should function fine, so why the interest in changing it? Is it not performing well? Or does it just not look pretty?You can look at exec times and scan counts by using:Set Statistics IO On Set Statistics time On...code...Set Statistics time OffSet Statistics IO Off
Thanks, I didn't know about Statistics IO and time. The code I wrote does work fine, but it felt like the wrong approach. The math guy in me is always looking for the elegant solution....And yeah, jcelko didn't really answer my question, but I did learn a new trick. It's not my database and I can't change a thing to it, so creating a new table is out of the question. Also, I've found out the hard way that having more than a couple levels of subqueries can slow things down a lot (I'm working with tables that have millions of rows each).But thanks for the tip about exec times and scan counts! |
 |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2011-07-25 : 13:54:43
|
You can also work to speed stuff up by using (and/or creating) proper indexes.Table variables or temp tables are sometimes helpful to simplify a multi-step process. And, if you are in 2005 or greater, you can use CTEs as well.It isn't simple to say which one to use when... but if something is performing badly, they are definitely options to investigate.Corey I Has Returned!! |
 |
|
|
|
|
|
|
|