Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Count vs Exists (T-SQL)

Author  Topic 

skataben
Starting Member

4 Posts

Posted - 2011-08-10 : 21:22:45
I have a database that keeps track of a hierarchical relationship between items. Each item has an ID field that is the primary key. Each item also has a parentID field that maps to the ID field of its parent.

Currently, I am able to retrieve a set of items while also retrieving the number of children it has. I use a query similar to this:

USE [MY_DATABASE]
SELECT [ID], [Prop1], [Prop2]
,(SELECT COUNT(*)
FROM [MyTable] AS [B] WITH (NOLOCK)
WHERE [B].[ParentID] = [A].[ID]
) AS [Child_Count]
FROM [MyTable] AS [A] WITH (NOLOCK)
WHERE [A].[Prop1] = 'something'


This works great for small result sets, but is incredibly slow for larger ones. This is understandable, having a correlated subquery that does a full table scan.

I actually don't need to know the number of children an item has. All I need to know is if it has children or not. True/false or 0/1 will do. I've tried a few different things but nothing performs as well as the above statement.

Statements like this are worse performers:
USE [MY_DATABASE]
SELECT [ID], [Prop1], [Prop2]
,(CASE
WHEN EXISTS
(SELECT top 1 *
FROM [MyTable] AS [B] WITH (NOLOCK)
WHERE [B].[ParentID] = [A].[ID])
THEN 1
ELSE 0
END) AS [Child_Count]
FROM [MyTable] AS [A] WITH (NOLOCK)
WHERE [A].[Prop1] = 'something'


And this:
USE [MY_DATABASE]
SELECT [ID], [Prop1], [Prop2]
,(SELECT COUNT(*) FROM
(SELECT TOP 1 *
FROM [MyTable] AS [B] WITH (NOLOCK)
WHERE [B].[ParentID] = [A].[ID]
) AS [dummy]
) AS [Child_Count]
FROM [MyTable] AS [A] WITH (NOLOCK)
WHERE [A].[Prop1] = 'something'


Both the previous statements are significantly slower than the original query. This is unintuitive for me. Can someone help me find a fast way to determine if an item has children or not?

I'm using SQL Server 2005 Service Pack 2 (SP2) (v9.0.3042).

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2011-08-11 : 01:06:55
If you're using exists, I don't think you'd need to use "top". You should be able to say, when exists (select 1 from...)

But have you tried this next query? I use this a lot to find similar types of information and usually performs very well for large record sets:

select [ID], [Prop1], [Prop2]
From [MyTable] A
Left Outer Join [MyTable] B ON A.[ID]=B.[ParentId]

Where A.[Prop1]='something' and B.[ParentId] is null


Re-read the criteria...my query filters anything out that does NOT have child records...or that's what I'm trying to achieve with it. I think you'd want:

select [ID], [Prop1], [Prop2],
case when B.[ParentId] is null then 0 else 1 end as 'Has Children'
From [MyTable] A
Left Outer Join [MyTable] B ON A.[ID]=B.[ParentId]

Where A.[Prop1]='something'
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2011-08-11 : 02:17:42
How big is the dataset you're working on and what is the index situation? Have you looked at the execution plan? EXISTS should in general be faster that the COUNT in a subquery like this because it only has to scan for one occurrence while the count needs to scan all occurrences.

Have you considered using hierarchyid (http://msdn.microsoft.com/en-us/library/bb677270.aspx) in this table? It lets you keep track of the full tree structure, move nodes around and you can also track the depth level of each node...I'm not sure it will give you any performance gain but it can improve stuff in terms of manageability. I think proper indexing will give you the greatest benefit in terms of performance...

- Lumbago
My blog-> http://thefirstsql.com/2011/07/08/how-to-find-gaps-in-identity-columns-at-the-speed-of-light/
Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2011-08-11 : 02:28:37
Are the ID and ParentID columns indexed at all? If the table structure is as "simple" as this I'd probably have ID as a clustered primary key and ParentID as a nonclustered index. No table-scanning would be necessary regardless if you're using a subquery with count or exists, or if you're using left outer join.

- Lumbago
My blog-> http://thefirstsql.com/2011/07/08/how-to-find-gaps-in-identity-columns-at-the-speed-of-light/
Go to Top of Page

skataben
Starting Member

4 Posts

Posted - 2011-08-13 : 16:51:59
Thanks for all the feedback.

@flambaster
Your idea worked great for me. Doing a left outer join greatly improved the performance. Thanks.

@Lumbago
The hierarchyid won't work for this case because I can't add new columns the tables. Seems like a good way to go about it though.

The indexing idea might work. The table structure is indeed not as simple as my example, but I may be able to index it. Will have to read some more on how to do that. Thanks.


Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2011-08-14 : 02:55:11
@Skataben

Cool! Like I said, I use that a lot for troubleshooting. If you want to only return rows where "TABLE B's" criteria weren't met, just state B...ID is null. Works wonders for keeping Dispatcher's work straight! :)
Go to Top of Page
   

- Advertisement -