IN versus Exists - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

IN versus Exists

Author

Topic

Sachin.Nand

2937 Posts

Posted - 2010-08-04 : 07:23:10

One of my seniors said that from SQL 2005 the optimiser is now quiet smart enough to create the same complie plan & understand that in short IN & Exists is one & the same thing.
Is it right???

Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless.

PBUH

Kristen
Test

22859 Posts

Posted - 2010-08-04 : 07:26:28

IN containing a NULL value is not the same as EXISTS though, is it?

(or is that only NOT IN with a NULL value perhaps?)

Kristen
Test

22859 Posts

Posted - 2010-08-04 : 07:31:03

Yeah, just NOT IN :



SET NOCOUNT ON
DECLARE	@TableA TABLE
(
	A_Int int
)

INSERT INTO @TableA
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT NULL

DECLARE	@TableB TABLE
(
	B_Int int
)

INSERT INTO @TableB
SELECT 1 UNION ALL
SELECT 5  UNION ALL
SELECT NULL

SELECT	B_Int AS [Exists]
FROM	@TableB
WHERE	EXISTS
	(
		SELECT	*
		FROM	@TableA
		WHERE	A_Int = B_Int
	)

SELECT	B_Int AS [In]
FROM	@TableB
WHERE	B_Int IN
	(
		SELECT	A_Int
		FROM	@TableA
	)

SELECT	B_Int AS [Not Exists]
FROM	@TableB
WHERE	NOT EXISTS
	(
		SELECT	*
		FROM	@TableA
		WHERE	A_Int = B_Int
	)

SELECT	B_Int AS [Not In]	-- Different result to NOT EXISTS
FROM	@TableB
WHERE	B_Int NOT IN
	(
		SELECT	A_Int
		FROM	@TableA
	)
SET NOCOUNT OFF

Edit: Added NULL example row to @TableB

EXISTS / NOT EXISTS symmetrically mutually exclusive, not so for IN / NOT IN

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2010-08-04 : 07:33:36

My experience is that IN() from SQL Server 2005 results in very bad performance if IN() has a select with many records (can't say this time what exactly I mean with "many records", sorry).
In SQL Server 2000 it wasn't a problem and since we have gone to 2005 there was a big problem in some queries with performance.

We changed then that IN() to EXISTS() [NOT IN() to NOT EXISTS()] and the performance was fine again

No, you're never too old to Yak'n'Roll if you're too young to die.

Kristen
Test

22859 Posts

Posted - 2010-08-04 : 07:35:28

Exists:


  |--Nested Loops(Left Semi Join, WHERE:([A_Int]=[B_Int]))
       |--Table Scan(OBJECT:(@TableB))
       |--Table Scan(OBJECT:(@TableA))

In:


  |--Nested Loops(Left Semi Join, WHERE:([B_Int]=[A_Int]))
       |--Table Scan(OBJECT:(@TableB))
       |--Table Scan(OBJECT:(@TableA))

NOT Exists:


  |--Nested Loops(Left Anti Semi Join, WHERE:([A_Int]=[B_Int]))
       |--Table Scan(OBJECT:(@TableB))
       |--Table Scan(OBJECT:(@TableA))

NOT In:


  |--Nested Loops(Left Anti Semi Join, WHERE:([B_Int] IS NULL OR [A_Int] IS NULL OR [B_Int]=[A_Int]))
       |--Table Scan(OBJECT:(@TableB))
       |--Table Scan(OBJECT:(@TableA))

So looks like IN is doing the same as EXISTS with extra "NOT IN Compatibility" thrown in for the last example

Kristen
Test

22859 Posts

Posted - 2010-08-04 : 07:39:12

Oh bother! My tests were performed with SQL 2008 ...

Put at the start:

SET SHOWPLAN_TEXT ON
GO

and at the end:

SET SHOWPLAN_TEXT OFF
GO

to compare query plans for other SQL versions

Kristen
Test

22859 Posts

Posted - 2010-08-04 : 07:40:07

quote:
Originally posted by webfred

(can't say this time what exactly I mean with "many records", sorry).

We avoid using IN with more than 100 records in the list - using a JOIN whenever possible (e.g. no duplicates) otherwise an EXISTS

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-04 : 07:53:59

As always, "it depends".
What is in the IN list? Fixed values? A query, with or without predicate?

N 56°04'39.26"
E 12°55'05.63"

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2010-08-04 : 08:02:09

Ok, so I have made a test in 2005 and also in a 2005 with mode 80.
The plans are identical.
But I know I have not dreamed that!
So it really depends like Peso said.
If only I could find such an old query that we have changed in the past but I don't know and don't have the time to search...

Maybe later I can get some light on it.

No, you're never too old to Yak'n'Roll if you're too young to die.

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-08-04 : 08:06:08

http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/

--
Gail Shaw
SQL Server MVP

Sachin.Nand

2937 Posts

Posted - 2010-08-04 : 08:26:16

Ok.
Let me give an example.How will this two queries perform??
Assume Table1 has a 10000 records & the matching records in Table2 are 5000 records with no null values in either of them.


1) select id from table1 t1 where id in(select id from table2 t2 where t1.id=t2.id)
and 
2)select id from table1 t1 where exists(select id from table2 t2 where t1.id=t2.id)

According to my senior the optimiser is smart enough to understand that both the queries are same.

Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless.

PBUH

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-08-04 : 14:02:29

quote:
Originally posted by Idera

Ok.
Let me give an example.How will this two queries perform??

Why don't you run them and find out? Look at the exec plan, look at the query statistics, see what differences there are, if any.

Or look at the two blog posts I linked to right above where I did those tests, then take my code and run it yourself

quote:
According to my senior the optimiser is smart enough to understand that both the queries are same.

If your senior is teaching you to write IN like that, please ask him to read the blog posts I wrote as well.
The IN should not have a WHERE clause, it'd not a correlated subquery. Only the EXISTS needs a where clause correlating it to the outer query.

--
Gail Shaw
SQL Server MVP

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-08-04 : 14:44:43

Perhaps try a JOIN?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-04 : 16:08:23

Adding a predicate can make a huge difference in a query using IN, especially a NOT IN query.

Not long ago I suggested this approach to a bad performing query (obfuscated below). See the metrics.
The underlying idea for the query is to fetch the records where only KeyA, or only KeyB, has matches in the auxiliary table. If both are matches, the record should not be returned.

-- Original query
SELECT	s.*
FROM	dbo.theSample AS s
WHERE	(
		s.KeyA IN (SELECT v.theKey FROM dbo.theVerification AS v)
		AND s.KeyB NOT IN (SELECT v.theKey FROM dbo.theVerification AS v)
	)
        OR (
		s.KeyB IN (SELECT v.theKey FROM dbo.theVerification AS v)
		AND s.KeyA NOT IN (SELECT v.theKey FROM dbo.theVerification AS v)
	)

-- Peso
SELECT	s.*
FROM	dbo.theSample AS s
WHERE	(
		s.KeyA IN (SELECT v.theKey FROM dbo.theVerification AS v)
		AND s.KeyB NOT IN (SELECT v.theKey FROM dbo.theVerification AS v WHERE v.theKey <> s.KeyB)
	)
	OR (
		s.KeyB IN (SELECT v.theKey FROM dbo.theVerification AS v)
		AND s.KeyA NOT IN (SELECT v.theKey FROM dbo.theVerification AS v WHERE v.theKey <> s.KeyA)
	)


Metrics from SQL Profiler

            CPU  Duration    Reads
Original  21481      1400  4012657
Peso          0         1       79


Metrics from STATISTICS IO (theSample has 1,000,000 records, and theVerification has 10 records)

-- Original query (17 rows affected)
Table 'theSample'.       Scan count  1, logical reads    11 522.
Table 'theVerification'. Scan count  0, logical reads 4 000 010.

-- Peso           (17 rows affected)
Table 'theSample'.       Scan count 20, logical reads        75.
Table 'theVerification'. Scan count  2, logical reads         4.

N 56°04'39.26"
E 12°55'05.63"

Sachin.Nand

2937 Posts

Posted - 2010-08-05 : 00:55:11

quote:
Originally posted by GilaMonster

quote:
Originally posted by Idera

Ok.
Let me give an example.How will this two queries perform??

Why don't you run them and find out? Look at the exec plan, look at the query statistics, see what differences there are, if any.

Or look at the two blog posts I linked to right above where I did those tests, then take my code and run it yourself

quote:
According to my senior the optimiser is smart enough to understand that both the queries are same.

If your senior is teaching you to write IN like that, please ask him to read the blog posts I wrote as well.
The IN should not have a WHERE clause, it'd not a correlated subquery. Only the EXISTS needs a where clause correlating it to the outer query.

--
Gail Shaw
SQL Server MVP

Thanks.

I tested it on my workstation.To my surpise "IN" is performing significantly little better than "Exist" ,also both the queries are returning the same records.
I haven't added any indexes though.

Here are the Metrics

IN
50000 row(s) affected


#tbl1 

   logical reads 169
   physical reads 0
   Scan count 1

#tbl2 

   logical reads 85
   physical reads 0
   Scan count 1

CPU Time 109ms


Exists
50000 row(s) affected

#tbl1 

   logical reads 169
   physical reads 0
   Scan count 1

#tbl2 

   logical reads 85
   physical reads 0
   Scan count 1

CPU Time 203ms

And here is the code with what I tested with



create table #tbl1 (id int identity)
insert  #tbl1 default values
while SCOPE_IDENTITY()<100000
insert  #tbl1 default values


create table #tbl2(id int identity)
insert  #tbl2 default values
while SCOPE_IDENTITY()<50000
insert  #tbl2 default values


select id from #tbl1 t1 where id in(select id from #tbl2 t2 where t1.id=t2.id)
select id from #tbl1 t1 where exists(select id from #tbl2 t2 where t1.id=t2.id)


drop table #tbl1,#tbl2

Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless.

PBUH

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-08-05 : 01:43:44

quote:
Originally posted by Peso

Adding a predicate can make a huge difference in a query using IN, especially a NOT IN query.

Interesting. That needs further investigation (and maybe a blog post).

Could it be to do with the NULLability of the column? NOT IN performs terribly when the column is nullable, regardless of whether or not there are nulls in.

--
Gail Shaw
SQL Server MVP

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-08-05 : 01:45:40

quote:
Originally posted by IderaI tested it on my workstation.To my surpise "IN" is performing significantly little better than "Exist" ,also both the queries are returning the same records.

Run your tests multiple times and average the results. Small variations in execution and CPU time are normal. If over multiple executions one is notably faster (statistically significant), then you can draw conclusions.

--
Gail Shaw
SQL Server MVP

Sachin.Nand

2937 Posts

Posted - 2010-08-05 : 02:25:28

Just dont know.I ran it multiple times "IN" still performs marginally better than "Exists".That's atleast what the Statistics show.
But if I change my IN query to this


select id from #tbl1 t1 where id in(select id from #tbl2 t2 )

Exists perform marginally better than IN

Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless.

PBUH

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-05 : 03:00:34

quote:
Originally posted by GilaMonster

Could it be to do with the NULLability of the column? NOT IN performs terribly when the column is nullable, regardless of whether or not there are nulls in.

--
Gail Shaw
SQL Server MVP

Good point, didn't think of that.
So I checked the repro script and here it is.

CREATE TABLE dbo.theSample
( KeyA int NOT NULL, KeyB int NOT NULL, slask nvarchar(100) NOT NULL  )
GO
CREATE TABLE dbo.theVerification (theKey INT PRIMARY KEY NOT NULL)

N 56°04'39.26"
E 12°55'05.63"

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2010-08-05 : 03:57:40

Now I am really confused about this part of Peso's statement:
AND s.KeyB NOT IN (SELECT v.theKey FROM dbo.theVerification AS v WHERE v.theKey <> s.KeyB)

Say s.KeyB is 4711.
Then "WHERE v.theKey <> 4711(s.KeyB)".
Then NOT IN will be always true!
Then I don't need that part of the query.

Or what am I missing.
Maybe my last night was too short?

No, you're never too old to Yak'n'Roll if you're too young to die.

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-08-05 : 04:16:31

quote:
Originally posted by Peso

quote:
Originally posted by GilaMonster

Could it be to do with the NULLability of the column? NOT IN performs terribly when the column is nullable, regardless of whether or not there are nulls in.
Good point, didn't think of that.
So I checked the repro script and here it is.

CREATE TABLE dbo.theSample
( KeyA int NOT NULL, KeyB int NOT NULL, slask nvarchar(100) NOT NULL )
GO
CREATE TABLE dbo.theVerification (theKey INT PRIMARY KEY NOT NULL)

More and more interesting. I'll investigate in detail, when I next have a few hours I can spend playing with SQL (not this month unfortunately)

--
Gail Shaw
SQL Server MVP

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources