Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 IN versus Exists

Author  Topic 

Sachin.Nand

2937 Posts

Posted - 2010-08-04 : 07:23:10
One of my seniors said that from SQL 2005 the optimiser is now quiet smart enough to create the same complie plan & understand that in short IN & Exists is one & the same thing.
Is it right???


Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless.

PBUH

Kristen
Test

22859 Posts

Posted - 2010-08-04 : 07:26:28
IN containing a NULL value is not the same as EXISTS though, is it?

(or is that only NOT IN with a NULL value perhaps?)
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-08-04 : 07:31:03
Yeah, just NOT IN :


SET NOCOUNT ON
DECLARE @TableA TABLE
(
A_Int int
)

INSERT INTO @TableA
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT NULL

DECLARE @TableB TABLE
(
B_Int int
)

INSERT INTO @TableB
SELECT 1 UNION ALL
SELECT 5 UNION ALL
SELECT NULL


SELECT B_Int AS [Exists]
FROM @TableB
WHERE EXISTS
(
SELECT *
FROM @TableA
WHERE A_Int = B_Int
)

SELECT B_Int AS [In]
FROM @TableB
WHERE B_Int IN
(
SELECT A_Int
FROM @TableA
)

SELECT B_Int AS [Not Exists]
FROM @TableB
WHERE NOT EXISTS
(
SELECT *
FROM @TableA
WHERE A_Int = B_Int
)

SELECT B_Int AS [Not In] -- Different result to NOT EXISTS
FROM @TableB
WHERE B_Int NOT IN
(
SELECT A_Int
FROM @TableA
)
SET NOCOUNT OFF


Edit: Added NULL example row to @TableB

EXISTS / NOT EXISTS symmetrically mutually exclusive, not so for IN / NOT IN
Go to Top of Page

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2010-08-04 : 07:33:36
My experience is that IN() from SQL Server 2005 results in very bad performance if IN() has a select with many records (can't say this time what exactly I mean with "many records", sorry).
In SQL Server 2000 it wasn't a problem and since we have gone to 2005 there was a big problem in some queries with performance.

We changed then that IN() to EXISTS() [NOT IN() to NOT EXISTS()] and the performance was fine again


No, you're never too old to Yak'n'Roll if you're too young to die.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-08-04 : 07:35:28
Exists:

|--Nested Loops(Left Semi Join, WHERE:([A_Int]=[B_Int]))
|--Table Scan(OBJECT:(@TableB))
|--Table Scan(OBJECT:(@TableA))


In:

|--Nested Loops(Left Semi Join, WHERE:([B_Int]=[A_Int]))
|--Table Scan(OBJECT:(@TableB))
|--Table Scan(OBJECT:(@TableA))





NOT Exists:

|--Nested Loops(Left Anti Semi Join, WHERE:([A_Int]=[B_Int]))
|--Table Scan(OBJECT:(@TableB))
|--Table Scan(OBJECT:(@TableA))


NOT In:

|--Nested Loops(Left Anti Semi Join, WHERE:([B_Int] IS NULL OR [A_Int] IS NULL OR [B_Int]=[A_Int]))
|--Table Scan(OBJECT:(@TableB))
|--Table Scan(OBJECT:(@TableA))


So looks like IN is doing the same as EXISTS with extra "NOT IN Compatibility" thrown in for the last example
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-08-04 : 07:39:12
Oh bother! My tests were performed with SQL 2008 ...

Put at the start:

SET SHOWPLAN_TEXT ON
GO

and at the end:

SET SHOWPLAN_TEXT OFF
GO

to compare query plans for other SQL versions
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-08-04 : 07:40:07
quote:
Originally posted by webfred

(can't say this time what exactly I mean with "many records", sorry).


We avoid using IN with more than 100 records in the list - using a JOIN whenever possible (e.g. no duplicates) otherwise an EXISTS
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-04 : 07:53:59
As always, "it depends".
What is in the IN list? Fixed values? A query, with or without predicate?



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2010-08-04 : 08:02:09
Ok, so I have made a test in 2005 and also in a 2005 with mode 80.
The plans are identical.
But I know I have not dreamed that!
So it really depends like Peso said.
If only I could find such an old query that we have changed in the past but I don't know and don't have the time to search...

Maybe later I can get some light on it.


No, you're never too old to Yak'n'Roll if you're too young to die.
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-08-04 : 08:06:08
http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/

--
Gail Shaw
SQL Server MVP
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2010-08-04 : 08:26:16
Ok.
Let me give an example.How will this two queries perform??
Assume Table1 has a 10000 records & the matching records in Table2 are 5000 records with no null values in either of them.

1) select id from table1 t1 where id in(select id from table2 t2 where t1.id=t2.id)
and
2)select id from table1 t1 where exists(select id from table2 t2 where t1.id=t2.id)

According to my senior the optimiser is smart enough to understand that both the queries are same.


Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless.

PBUH
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-08-04 : 14:02:29
quote:
Originally posted by Idera

Ok.
Let me give an example.How will this two queries perform??


Why don't you run them and find out? Look at the exec plan, look at the query statistics, see what differences there are, if any.

Or look at the two blog posts I linked to right above where I did those tests, then take my code and run it yourself

quote:
According to my senior the optimiser is smart enough to understand that both the queries are same.


If your senior is teaching you to write IN like that, please ask him to read the blog posts I wrote as well.
The IN should not have a WHERE clause, it'd not a correlated subquery. Only the EXISTS needs a where clause correlating it to the outer query.

--
Gail Shaw
SQL Server MVP
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-08-04 : 14:44:43
Perhaps try a JOIN?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-04 : 16:08:23
Adding a predicate can make a huge difference in a query using IN, especially a NOT IN query.

Not long ago I suggested this approach to a bad performing query (obfuscated below). See the metrics.
The underlying idea for the query is to fetch the records where only KeyA, or only KeyB, has matches in the auxiliary table. If both are matches, the record should not be returned.
-- Original query
SELECT s.*
FROM dbo.theSample AS s
WHERE (
s.KeyA IN (SELECT v.theKey FROM dbo.theVerification AS v)
AND s.KeyB NOT IN (SELECT v.theKey FROM dbo.theVerification AS v)
)
OR (
s.KeyB IN (SELECT v.theKey FROM dbo.theVerification AS v)
AND s.KeyA NOT IN (SELECT v.theKey FROM dbo.theVerification AS v)
)

-- Peso
SELECT s.*
FROM dbo.theSample AS s
WHERE (
s.KeyA IN (SELECT v.theKey FROM dbo.theVerification AS v)
AND s.KeyB NOT IN (SELECT v.theKey FROM dbo.theVerification AS v WHERE v.theKey <> s.KeyB)
)
OR (
s.KeyB IN (SELECT v.theKey FROM dbo.theVerification AS v)
AND s.KeyA NOT IN (SELECT v.theKey FROM dbo.theVerification AS v WHERE v.theKey <> s.KeyA)
)


Metrics from SQL Profiler

CPU Duration Reads
Original 21481 1400 4012657
Peso 0 1 79


Metrics from STATISTICS IO (theSample has 1,000,000 records, and theVerification has 10 records)

-- Original query (17 rows affected)
Table 'theSample'. Scan count 1, logical reads 11 522.
Table 'theVerification'. Scan count 0, logical reads 4 000 010.

-- Peso (17 rows affected)
Table 'theSample'. Scan count 20, logical reads 75.
Table 'theVerification'. Scan count 2, logical reads 4.


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2010-08-05 : 00:55:11
quote:
Originally posted by GilaMonster

quote:
Originally posted by Idera

Ok.
Let me give an example.How will this two queries perform??


Why don't you run them and find out? Look at the exec plan, look at the query statistics, see what differences there are, if any.

Or look at the two blog posts I linked to right above where I did those tests, then take my code and run it yourself

quote:
According to my senior the optimiser is smart enough to understand that both the queries are same.


If your senior is teaching you to write IN like that, please ask him to read the blog posts I wrote as well.
The IN should not have a WHERE clause, it'd not a correlated subquery. Only the EXISTS needs a where clause correlating it to the outer query.

--
Gail Shaw
SQL Server MVP



Thanks.

I tested it on my workstation.To my surpise "IN" is performing significantly little better than "Exist" ,also both the queries are returning the same records.
I haven't added any indexes though.

Here are the Metrics

IN
50000 row(s) affected

#tbl1

logical reads 169
physical reads 0
Scan count 1

#tbl2

logical reads 85
physical reads 0
Scan count 1

CPU Time 109ms


Exists
50000 row(s) affected

#tbl1

logical reads 169
physical reads 0
Scan count 1

#tbl2

logical reads 85
physical reads 0
Scan count 1

CPU Time 203ms



And here is the code with what I tested with



create table #tbl1 (id int identity)
insert #tbl1 default values
while SCOPE_IDENTITY()<100000
insert #tbl1 default values


create table #tbl2(id int identity)
insert #tbl2 default values
while SCOPE_IDENTITY()<50000
insert #tbl2 default values


select id from #tbl1 t1 where id in(select id from #tbl2 t2 where t1.id=t2.id)
select id from #tbl1 t1 where exists(select id from #tbl2 t2 where t1.id=t2.id)


drop table #tbl1,#tbl2



Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless.

PBUH
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-08-05 : 01:43:44
quote:
Originally posted by Peso

Adding a predicate can make a huge difference in a query using IN, especially a NOT IN query.


Interesting. That needs further investigation (and maybe a blog post).

Could it be to do with the NULLability of the column? NOT IN performs terribly when the column is nullable, regardless of whether or not there are nulls in.

--
Gail Shaw
SQL Server MVP
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-08-05 : 01:45:40
quote:
Originally posted by IderaI tested it on my workstation.To my surpise "IN" is performing significantly little better than "Exist" ,also both the queries are returning the same records.


Run your tests multiple times and average the results. Small variations in execution and CPU time are normal. If over multiple executions one is notably faster (statistically significant), then you can draw conclusions.

--
Gail Shaw
SQL Server MVP
Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2010-08-05 : 02:25:28
Just dont know.I ran it multiple times "IN" still performs marginally better than "Exists".That's atleast what the Statistics show.
But if I change my IN query to this

select id from #tbl1 t1 where id in(select id from #tbl2 t2 )


Exists perform marginally better than IN


Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless.

PBUH
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-05 : 03:00:34
quote:
Originally posted by GilaMonster

Could it be to do with the NULLability of the column? NOT IN performs terribly when the column is nullable, regardless of whether or not there are nulls in.

--
Gail Shaw
SQL Server MVP
Good point, didn't think of that.
So I checked the repro script and here it is.

CREATE TABLE dbo.theSample
( KeyA int NOT NULL, KeyB int NOT NULL, slask nvarchar(100) NOT NULL )
GO
CREATE TABLE dbo.theVerification (theKey INT PRIMARY KEY NOT NULL)



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2010-08-05 : 03:57:40
Now I am really confused about this part of Peso's statement:
AND s.KeyB NOT IN (SELECT v.theKey FROM dbo.theVerification AS v WHERE v.theKey <> s.KeyB)

Say s.KeyB is 4711.
Then "WHERE v.theKey <> 4711(s.KeyB)".
Then NOT IN will be always true!
Then I don't need that part of the query.

Or what am I missing.
Maybe my last night was too short?


No, you're never too old to Yak'n'Roll if you're too young to die.
Go to Top of Page

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2010-08-05 : 04:16:31
quote:
Originally posted by Peso

quote:
Originally posted by GilaMonster

Could it be to do with the NULLability of the column? NOT IN performs terribly when the column is nullable, regardless of whether or not there are nulls in.
Good point, didn't think of that.
So I checked the repro script and here it is.

CREATE TABLE dbo.theSample
( KeyA int NOT NULL, KeyB int NOT NULL, slask nvarchar(100) NOT NULL )
GO
CREATE TABLE dbo.theVerification (theKey INT PRIMARY KEY NOT NULL)



More and more interesting. I'll investigate in detail, when I next have a few hours I can spend playing with SQL (not this month unfortunately)

--
Gail Shaw
SQL Server MVP
Go to Top of Page
    Next Page

- Advertisement -