Author |
Topic |
Sachin.Nand
2937 Posts |
Posted - 2010-08-04 : 07:23:10
|
One of my seniors said that from SQL 2005 the optimiser is now quiet smart enough to create the same complie plan & understand that in short IN & Exists is one & the same thing.Is it right???Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless. PBUH |
|
Kristen
Test
22859 Posts |
Posted - 2010-08-04 : 07:26:28
|
IN containing a NULL value is not the same as EXISTS though, is it?(or is that only NOT IN with a NULL value perhaps?) |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-08-04 : 07:31:03
|
Yeah, just NOT IN :SET NOCOUNT ONDECLARE @TableA TABLE( A_Int int)INSERT INTO @TableASELECT 1 UNION ALLSELECT 2 UNION ALLSELECT 3 UNION ALLSELECT NULLDECLARE @TableB TABLE( B_Int int)INSERT INTO @TableBSELECT 1 UNION ALLSELECT 5 UNION ALLSELECT NULLSELECT B_Int AS [Exists]FROM @TableBWHERE EXISTS ( SELECT * FROM @TableA WHERE A_Int = B_Int )SELECT B_Int AS [In]FROM @TableBWHERE B_Int IN ( SELECT A_Int FROM @TableA )SELECT B_Int AS [Not Exists]FROM @TableBWHERE NOT EXISTS ( SELECT * FROM @TableA WHERE A_Int = B_Int )SELECT B_Int AS [Not In] -- Different result to NOT EXISTSFROM @TableBWHERE B_Int NOT IN ( SELECT A_Int FROM @TableA )SET NOCOUNT OFF Edit: Added NULL example row to @TableBEXISTS / NOT EXISTS symmetrically mutually exclusive, not so for IN / NOT IN |
 |
|
webfred
Master Smack Fu Yak Hacker
8781 Posts |
Posted - 2010-08-04 : 07:33:36
|
My experience is that IN() from SQL Server 2005 results in very bad performance if IN() has a select with many records (can't say this time what exactly I mean with "many records", sorry). In SQL Server 2000 it wasn't a problem and since we have gone to 2005 there was a big problem in some queries with performance.We changed then that IN() to EXISTS() [NOT IN() to NOT EXISTS()] and the performance was fine again  No, you're never too old to Yak'n'Roll if you're too young to die. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-08-04 : 07:35:28
|
Exists: |--Nested Loops(Left Semi Join, WHERE:([A_Int]=[B_Int])) |--Table Scan(OBJECT:(@TableB)) |--Table Scan(OBJECT:(@TableA)) In: |--Nested Loops(Left Semi Join, WHERE:([B_Int]=[A_Int])) |--Table Scan(OBJECT:(@TableB)) |--Table Scan(OBJECT:(@TableA)) NOT Exists: |--Nested Loops(Left Anti Semi Join, WHERE:([A_Int]=[B_Int])) |--Table Scan(OBJECT:(@TableB)) |--Table Scan(OBJECT:(@TableA)) NOT In: |--Nested Loops(Left Anti Semi Join, WHERE:([B_Int] IS NULL OR [A_Int] IS NULL OR [B_Int]=[A_Int])) |--Table Scan(OBJECT:(@TableB)) |--Table Scan(OBJECT:(@TableA)) So looks like IN is doing the same as EXISTS with extra "NOT IN Compatibility" thrown in for the last example |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-08-04 : 07:39:12
|
Oh bother! My tests were performed with SQL 2008 ...Put at the start:SET SHOWPLAN_TEXT ONGOand at the end:SET SHOWPLAN_TEXT OFFGOto compare query plans for other SQL versions |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-08-04 : 07:40:07
|
quote: Originally posted by webfred (can't say this time what exactly I mean with "many records", sorry).
We avoid using IN with more than 100 records in the list - using a JOIN whenever possible (e.g. no duplicates) otherwise an EXISTS |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-08-04 : 07:53:59
|
As always, "it depends".What is in the IN list? Fixed values? A query, with or without predicate? N 56°04'39.26"E 12°55'05.63" |
 |
|
webfred
Master Smack Fu Yak Hacker
8781 Posts |
Posted - 2010-08-04 : 08:02:09
|
Ok, so I have made a test in 2005 and also in a 2005 with mode 80.The plans are identical.But I know I have not dreamed that!So it really depends like Peso said.If only I could find such an old query that we have changed in the past but I don't know and don't have the time to search...Maybe later I can get some light on it. No, you're never too old to Yak'n'Roll if you're too young to die. |
 |
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
|
Sachin.Nand
2937 Posts |
Posted - 2010-08-04 : 08:26:16
|
Ok.Let me give an example.How will this two queries perform??Assume Table1 has a 10000 records & the matching records in Table2 are 5000 records with no null values in either of them.1) select id from table1 t1 where id in(select id from table2 t2 where t1.id=t2.id)and 2)select id from table1 t1 where exists(select id from table2 t2 where t1.id=t2.id) According to my senior the optimiser is smart enough to understand that both the queries are same.Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless. PBUH |
 |
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-08-04 : 14:02:29
|
quote: Originally posted by Idera Ok.Let me give an example.How will this two queries perform??
Why don't you run them and find out? Look at the exec plan, look at the query statistics, see what differences there are, if any.Or look at the two blog posts I linked to right above where I did those tests, then take my code and run it yourselfquote: According to my senior the optimiser is smart enough to understand that both the queries are same.
If your senior is teaching you to write IN like that, please ask him to read the blog posts I wrote as well. The IN should not have a WHERE clause, it'd not a correlated subquery. Only the EXISTS needs a where clause correlating it to the outer query.--Gail ShawSQL Server MVP |
 |
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2010-08-04 : 14:44:43
|
Perhaps try a JOIN? |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-08-04 : 16:08:23
|
Adding a predicate can make a huge difference in a query using IN, especially a NOT IN query.Not long ago I suggested this approach to a bad performing query (obfuscated below). See the metrics.The underlying idea for the query is to fetch the records where only KeyA, or only KeyB, has matches in the auxiliary table. If both are matches, the record should not be returned.-- Original querySELECT s.*FROM dbo.theSample AS sWHERE ( s.KeyA IN (SELECT v.theKey FROM dbo.theVerification AS v) AND s.KeyB NOT IN (SELECT v.theKey FROM dbo.theVerification AS v) ) OR ( s.KeyB IN (SELECT v.theKey FROM dbo.theVerification AS v) AND s.KeyA NOT IN (SELECT v.theKey FROM dbo.theVerification AS v) )-- PesoSELECT s.*FROM dbo.theSample AS sWHERE ( s.KeyA IN (SELECT v.theKey FROM dbo.theVerification AS v) AND s.KeyB NOT IN (SELECT v.theKey FROM dbo.theVerification AS v WHERE v.theKey <> s.KeyB) ) OR ( s.KeyB IN (SELECT v.theKey FROM dbo.theVerification AS v) AND s.KeyA NOT IN (SELECT v.theKey FROM dbo.theVerification AS v WHERE v.theKey <> s.KeyA) )Metrics from SQL Profiler CPU Duration ReadsOriginal 21481 1400 4012657Peso 0 1 79Metrics from STATISTICS IO (theSample has 1,000,000 records, and theVerification has 10 records)-- Original query (17 rows affected)Table 'theSample'. Scan count 1, logical reads 11 522.Table 'theVerification'. Scan count 0, logical reads 4 000 010.-- Peso (17 rows affected)Table 'theSample'. Scan count 20, logical reads 75.Table 'theVerification'. Scan count 2, logical reads 4. N 56°04'39.26"E 12°55'05.63" |
 |
|
Sachin.Nand
2937 Posts |
Posted - 2010-08-05 : 00:55:11
|
quote: Originally posted by GilaMonster
quote: Originally posted by Idera Ok.Let me give an example.How will this two queries perform??
Why don't you run them and find out? Look at the exec plan, look at the query statistics, see what differences there are, if any.Or look at the two blog posts I linked to right above where I did those tests, then take my code and run it yourselfquote: According to my senior the optimiser is smart enough to understand that both the queries are same.
If your senior is teaching you to write IN like that, please ask him to read the blog posts I wrote as well. The IN should not have a WHERE clause, it'd not a correlated subquery. Only the EXISTS needs a where clause correlating it to the outer query.--Gail ShawSQL Server MVP Thanks.I tested it on my workstation.To my surpise "IN" is performing significantly little better than "Exist" ,also both the queries are returning the same records.I haven't added any indexes though.Here are the MetricsIN50000 row(s) affected#tbl1 logical reads 169 physical reads 0 Scan count 1#tbl2 logical reads 85 physical reads 0 Scan count 1CPU Time 109msExists50000 row(s) affected#tbl1 logical reads 169 physical reads 0 Scan count 1#tbl2 logical reads 85 physical reads 0 Scan count 1CPU Time 203ms And here is the code with what I tested withcreate table #tbl1 (id int identity)insert #tbl1 default valueswhile SCOPE_IDENTITY()<100000insert #tbl1 default valuescreate table #tbl2(id int identity)insert #tbl2 default valueswhile SCOPE_IDENTITY()<50000insert #tbl2 default valuesselect id from #tbl1 t1 where id in(select id from #tbl2 t2 where t1.id=t2.id)select id from #tbl1 t1 where exists(select id from #tbl2 t2 where t1.id=t2.id)drop table #tbl1,#tbl2 Limitations live only in our minds. But if we use our imaginations, our possibilities become limitless. PBUH |
 |
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-08-05 : 01:43:44
|
quote: Originally posted by Peso Adding a predicate can make a huge difference in a query using IN, especially a NOT IN query.
Interesting. That needs further investigation (and maybe a blog post).Could it be to do with the NULLability of the column? NOT IN performs terribly when the column is nullable, regardless of whether or not there are nulls in.--Gail ShawSQL Server MVP |
 |
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-08-05 : 01:45:40
|
quote: Originally posted by IderaI tested it on my workstation.To my surpise "IN" is performing significantly little better than "Exist" ,also both the queries are returning the same records.
Run your tests multiple times and average the results. Small variations in execution and CPU time are normal. If over multiple executions one is notably faster (statistically significant), then you can draw conclusions.--Gail ShawSQL Server MVP |
 |
|
Sachin.Nand
2937 Posts |
Posted - 2010-08-05 : 02:25:28
|
Just dont know.I ran it multiple times "IN" still performs marginally better than "Exists".That's atleast what the Statistics show.But if I change my IN query to thisselect id from #tbl1 t1 where id in(select id from #tbl2 t2 ) Exists perform marginally better than INLimitations live only in our minds. But if we use our imaginations, our possibilities become limitless. PBUH |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-08-05 : 03:00:34
|
quote: Originally posted by GilaMonster Could it be to do with the NULLability of the column? NOT IN performs terribly when the column is nullable, regardless of whether or not there are nulls in.--Gail ShawSQL Server MVP
Good point, didn't think of that.So I checked the repro script and here it is.CREATE TABLE dbo.theSample( KeyA int NOT NULL, KeyB int NOT NULL, slask nvarchar(100) NOT NULL )GOCREATE TABLE dbo.theVerification (theKey INT PRIMARY KEY NOT NULL) N 56°04'39.26"E 12°55'05.63" |
 |
|
webfred
Master Smack Fu Yak Hacker
8781 Posts |
Posted - 2010-08-05 : 03:57:40
|
Now I am really confused about this part of Peso's statement:AND s.KeyB NOT IN (SELECT v.theKey FROM dbo.theVerification AS v WHERE v.theKey <> s.KeyB)Say s.KeyB is 4711.Then "WHERE v.theKey <> 4711(s.KeyB)".Then NOT IN will be always true!Then I don't need that part of the query.Or what am I missing.Maybe my last night was too short? No, you're never too old to Yak'n'Roll if you're too young to die. |
 |
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-08-05 : 04:16:31
|
quote: Originally posted by Peso
quote: Originally posted by GilaMonster Could it be to do with the NULLability of the column? NOT IN performs terribly when the column is nullable, regardless of whether or not there are nulls in.
Good point, didn't think of that.So I checked the repro script and here it is.CREATE TABLE dbo.theSample( KeyA int NOT NULL, KeyB int NOT NULL, slask nvarchar(100) NOT NULL )GOCREATE TABLE dbo.theVerification (theKey INT PRIMARY KEY NOT NULL)
More and more interesting. I'll investigate in detail, when I next have a few hours I can spend playing with SQL (not this month unfortunately)--Gail ShawSQL Server MVP |
 |
|
Next Page
|