Can someone explain SELECT statement aggregates? - SQL Server Forums

Home | Weblogs | Forums

Site Sponsored By: SQLDSC - SQL Server Desired State Configuration

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

Can someone explain SELECT statement aggregates?

Author

Topic

meef
Posting Yak Master

113 Posts

Posted - 2011-06-09 : 15:37:00

I really don't understand why you are forced to either group by or use an aggregate for every single field if you only need one. Example:

SELECT	bol_number, u.ref_num, pro_number, client_zip, client_city, customer_name, customer_state, tb.carrier_name, 
bol_status, p.delivered_date, customer_zip, ship_date, p.weight, io, p.date_last_updated,
convert(date,curr_est_deliver_date,101) as [est_delivery], convert(date,appt_date,101) as [appt_date]

If I want a COUNT on bol_number I have to either group every single field or do a MAX/SUM/FIRST etc. on them. What is the reason for this?

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2011-06-09 : 15:50:03

Because SQL Server needs to know how you want to aggregate your data.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-06-09 : 16:07:12

You can also try (SQL 2005+):

SELECT count(bol_number) over (partition by 1) as count_bol_number,
bol_number, u.ref_num, pro_number, client_zip, client_city, customer_name, customer_state, tb.carrier_name,
bol_status, p.delivered_date, customer_zip, ship_date, p.weight, io, p.date_last_updated,
convert(date,curr_est_deliver_date,101) as [est_delivery], convert(date,appt_date,101) as [appt_date]

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2011-06-11 : 13:00:11

>> I really don't understand why you are forced to either GROUP BY or use an aggregate for every single field [sic] if you only need one. <<

Your first problem is that you do not know that columns are nothing at all like fields. The next problem is that you are still using CONVERT(); the ANSI/ISO Standard is CAST(). CONVERT is an old Sybase coder museum for formatting dates in the database. Since formatting data in the database is a total violation of the most basic principles of RDBMS, good programmers never use it.

Today, if we have legacy data with the old DATETIME data type, we get the date part with CAST(curr_est_deliver_date AS DATE); This will use 3 bytes for the DATE data type. Then do an ALTER TABLE As fast as you can. You should never be doing this casting again.

Also, do not keep audit data in the table being audited; you can go to jail.

Now, let me answer your question with the stuff you should have learned in your first SQL class:

Here is how a SELECT works in SQL ... at least in theory. Real products will optimize things, but the code has to produce the same results.

a) Effectively materialize the CTEs in the optional WITH clause. CTE's come into existence in the order they are declared so only backward references are allowed. A CTE can be recursive. Think of them AS VIEWs that exist only in the scope of the query. In practice, if they are used once then they are implemented AS an in-line macro.

b) Start in the FROM clause and build a working table from all of the joins, unions, intersections, and whatever other table constructors are there. The <table expression> AS <correlation name> option allows you give a name to this working table which you then have to use for the rest of the containing query. Ther are UNIONB, INTERSECT and EXCEPT set construtors, LATERAL tables, table-valued function and all kinds of things happening in here.

c) Go to the WHERE clause and remove rows that do not pass criteria; that is, that do not test to TRUE (i.e. reject UNKNOWN and FALSE). The WHERE clause is applied to the working set in the FROM clause.

d) Go to the optional GROUP BY clause, partition the original table into groups and reduce each grouping to a *single* row, replacing the original working table with the new grouped table. The rows of a grouped table must be only group characteristics: (1) a grouping column (2) a statistic about the group (i.e. aggregate functions) (3) a function or constant(4) an expression made up of only those three items. The original table no longer exists and you cannot reference anything in it (this was an error in early Sybase products).

e) Go to the optional HAVING clause and apply it against the grouped working table; if there was no GROUP BY clause, treat the entire table AS one grouP.

f) Go to the SELECT clause and construct the expressions in the list. This means that the scalar subqueries, function calls and expressions in the SELECT are done after all the other clauses are done. The AS operator can also give names to expressions in the SELECT list. These new names come into existence all at once, but after the WHERE clause, GROUP BY clause and HAVING clause have been executed; you cannot use them in the SELECT list or the WHERE clause for that reason.

If there is a SELECT DISTINCT, then redundant duplicate rows are removed. For purposes of defining a duplicate row, NULLs are treated AS matching (just like in the GROUP BY).

g) Nested query expressions follow the usual scoping rules you would expect from a block structured language like C, Pascal, Algol, etc. Namely, the innermost queries can reference columns and tables in the queries in which they are contained.

h) The ORDER BY clause is part of a cursor, not a query. The result set is passed to the cursor, which can only see the names in the SELECT clause list, and the sorting is done there. The ORDER BY clause cannot have expression in it, or references to other columns because the result set has been converted into a sequential file structure and that is what is being sorted.

AS you can see, things happen "all at once" in SQL, not "from left to right" AS they would in a sequential file/procedural language model. In those languages, these two statements produce different results:
READ (a, b, c) FROM File_X;
READ (c, a, b) FROM File_X;

while these two statements return the same data:

SELECT a, b, c FROM Table_X;
SELECT c, a, b FROM Table_X;

Think about what a confused mess this statement is in the SQL model.

SELECT f(c2) AS c1, f(c1) AS c2 FROM Foobar;

That is why such nonsense is illegal syntax.

Now is here is a way around this with the new window clause:

SELECT bol_number, COUNT(bol_number) OVER () AS bol_nbr_cnt,
u.ref_num, pro_number, client_zip, client_city, customer_name,
customer_state, TB.carrier_name, bol_status, P.delivery_date,
customer_zip, ship_date, P.weight, io,
CAST(curr_est_deliver_date AS DATE) AS est_delivery,
CAST(appt_date AS DATE) AS appt_date
FROM ????
WHERE ??;

Of course the count will be repeated over and over in every row in the table.

--CELKO--
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources