Aggregating Correlated Sub-QueriesBy Jeff Smith on 23 October 2007 | Tags: Queries Every now and then you may attempt to calculate an aggregate function -- such as SUM() -- on a correlated subquery, only to encounter the following error: Msg 130, Level 15, State 1, Line 24 Cannot perform an aggregate function on an expression containing an aggregate or a subquery. This can be a tricky situation to resolve depending on how your query is structured and how complicated it is, but here's a couple of ideas to help you out. Summing up a correlated SUM()Let's start by looking at a typical example. Suppose you have written the following SELECT: select Company.Name, Company.Region, (select sum(Amount) from Orders where Orders.CompanyID = Company.CompanyID) as Total from Company Notice that we have used a correlated sub-query to return the total Order Amount per Company. Working from that, let's say that you instead would like to the return the total Order Amount by Region. Typically, you would remove Company from the SELECT list, add GROUP BY Region to the end, and wrap what you'd like to total in in a SUM() aggregate function: select Company.Region, sum(select sum(Amount) from Orders where Orders.CompanyID = Company.CompanyID) as Total from Company group by Company.Region Unfortunately, that is not a valid SQL statement, since you are trying to directly SUM() a correlated sub-query (which also happens to have a SUM() in it already). This is not allowed in SQL Server; if you try to execute that, you'll be greeted with the error mentioned above. There are two ways to fix this: Option 1: Use a dervied tablePerhaps the easiest solution is to simply wrap your original SELECT in a derived table, and then select from that derived table and do the grouping in the outer select: select Region, sum(Total) as Total from (select Company.Name, Company.Region, (select sum(Amount) from Orders where Orders.CompanyID = Company.CompanyID) as Total from Company ) x group by Region This is logically equivalent to what we tried to do earlier, except it is now a valid SQL statement and it will return the correct results without an error. This is a very quick and easy way to solve this situation in general, though it is not always the optimal solution in terms of performance and readability. Option 2: Rewrite your SELECT without a Correlated SubqueryOften the best solution is to rewrite your SELECT without using a correlated sub-query at all. In our example, we could write the original as: select Company.Name, Company.Region, sum(Orders.Amount) as Total from Company left outer Orders on Orders.CompanyID = Company.CompanyID group by Company.Name, Company.Region Once the SELECT is re-written, we can now easily adjust that to return only totals by Region simply by removing Company.Name from the SELECT and GROUP BY clauses: select Company.Region, sum(Orders.Amount) as Total from Company left outer Orders on Orders.CompanyID = Company.CompanyID group by Company.Region This is usually a good way to solve the issue because the end result is very efficient and well-structured, so it is easier to understand and maintain. Nesting Mixed Aggregate FunctionsThe previous example was fairly easy to rewrite because all we were using was a SUM() aggregate; it is quite easy just to adjust the grouping to get the totals that you need. However, what happens if you'd like to calculate the SUM() of the MAX() Order Amount per company? That is, let's say you have this: select Company.Name, Company.Region, max(orders.Amount) as MaxOrderAmount from Company left outer join Orders on Orders.CompanyID = Company.CompanyID group by Company.Name, Company.Region That returns each Company along with their Region and their top Order Amount. Now, let's say we'd like to summarize that by Region, so that we are returning the SUM() of each customer's maximum Order Amount per Region. We can try to remove Company.Name from the GROUP BY and the SELECT, and add a SUM() around the MAX() expression like this: select Company.Region, sum(max(orders.Amount)) as TotalMaxOrderAmount from Company left outer join Orders on Orders.CompanyID = Company.CompanyID group by Company.Region ... but that will result in an error, since an aggregate function cannot be contained within another aggregate function. It also doesn't make much logical sense, anyway: We have a MAX() function nested within the SUM() function, but we never indicate anywhere what the MAX() is for. To solve this, just wrap the original SELECT in a derived table, GROUP BY Region and calculate the SUM() of the MaxOrderAmount: select Region, sum(MaxOrderAmount) as TotalCustomerMaxOrders from ( select Company.Name, Company.Region, max(orders.Amount) as MaxOrderAmount from Company left outer join Orders on Orders.CompanyID = Company.CompanyID group by Company.Name, Company.Region ) x Now it is clear that we are first summarizing the Orders by Company, and then summarizing those results by Region. The derived table allows us to break down the problem into smaller, simpler parts and to calculate one aggregate function within another. A More Complicated SituationAnother example that I've seen recently is replacing a hard-coded list of values in an IN() clause with values selected from a table. For example, suppose you'd like to return a grand total of all items in a table along with a subtotal of just a select few; you can use a SUM(CASE ...) expression like this: select sum(Amount) as Total, sum(case when Code in ('a','b','d') then Amount else 0 end) as SubTotal from yourtable Now, what if that list of Codes actually comes from another table? That is, instead of hard-coding when Code in ('a','b','d') we'd like it to be when Code in (select Code from SubTotalCodes) If we try to make that change in our previous SQL statement: select sum(Amount) as Total, sum(case when Code in (select Code from SubTotalCodes) then Amount else 0 end) as SubTotal from SomeTable we get that dreaded error once again, because we are trying to apply a SUM() on an expression containing a sub-query. This one seems a bit more tricky to solve! We can handle this by removing the SUM() aggregate functions completely, wrapping that in a derived table, and then totaling it up: select sum(Total) as Total, sum(SubTotal) as Total from ( select Amount as Total, case when Code in (select Code from SubTotalCodes) then Amount else 0 end as SubTotal from SomeTable ) x We also could rewrite the previous statement using JOINS instead of an IN() clause like this: select sum(Amount) as Total, sum(case when SubTotalCodes.Code is null then 0 else Amount end) as SubTotal from SomeTable left outer join SubTotalCodes on SubTotalCodes.Code = SomeTable.Code We'd have to be sure the primary key of SubTotalCodes is on the "Code" column, otherwise the JOIN may cause duplicate values to be aggregated in our results. ConclusionResolving this situation can sometimes be tricky, but it can always be done. If you rewrite your query using standard JOINS and/or derived tables, it will not only fix the error, but it will also result in a clearer and more structured SELECT statement.
|
- Advertisement - |