Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Trying to avoid functions in Where Clause

Author  Topic 

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2010-11-27 : 04:09:38
Folks,

After much research and advice, I'm trying to rewrite a lot of my queries because many of them have functions in the where clause and I realize this causes performance issues. Before I set up the testing that I've done, I want to point out that the datediff function I'm running looks a bit redundant. The program that runs on the database I use has all the dates in int datatype (yyyymmdd) and I can't change it. So the work around is that all the dates need to be converted to an acceptable format for datediff (I realize I've converted to char, but it works better in the environment we use for a few reasons I can clarify later)

Anyway, out of the 4 tests I've run, I'm not really understanding the performance difference when I look at the estimated execution plans. If anyone has any pointers as to which of the 4 test queries looks optimal, I'd appreciate it (and "why" if at all possible) I'm thinking the one with the CTE is probably the best performer, but that's also the least practical for the environment (again, will need to clarify that later)


Here's some sample data using a table variable:

Declare @CDate datetime
set @CDate=GETDATE() --Run on 11-27-10

Declare @BadDateFormatTable Table (DateInt Int)
insert into @BadDateFormatTable (DateInt)
Values (20101119);
insert into @BadDateFormatTable (DateInt)
Values (20101120);
insert into @BadDateFormatTable (DateInt)
Values (20101121);
insert into @BadDateFormatTable (DateInt)
Values (20101122);
insert into @BadDateFormatTable (DateInt)
Values (20101123);
insert into @BadDateFormatTable (DateInt)
Values (20101124);
insert into @BadDateFormatTable (DateInt)
Values (20101125);
insert into @BadDateFormatTable (DateInt)
Values (20101126);
insert into @BadDateFormatTable (DateInt)
Values (20101127);

--Test1 : Use results of subquery in order to get the product of datediff function

Select A.BadDate
From (
Select CONVERT(char(8), dateint,112) as BadDate,
DATEDIFF(dd, convert(char(8), dateint,112), @cdate) as DDiff
From @BadDateFormatTable) A
Where A.DDiff between 1 and 3;

--Test2 : Use the DateDiff function in derived table and join using the datediff to return the desired days

Select A.DateInt
From @BadDateFormatTable A
Join (Select Dateint, DATEDIFF(dd, convert(char(8), dateint,112),@cdate) as DDiff
From @BadDateFormatTable) B On A.DateInt=B.DateInt and B.DDiff between 1 and 3;

--Test3 : Use DateDiff Function in derived table and filter using the calculated ddiff column in where clause

Select A.DateInt
From @BadDateFormatTable A
Join (Select Dateint, DATEDIFF(dd, convert(char(8), dateint,112),@cdate) as DDiff
From @BadDateFormatTable) B on A.DateInt=B.DateInt
Where B.DDiff between 1 and 3;

--Test4 : Use CTE to produce datediff

With DateCte (DateInt, DDiff)
As (Select Dateint, DATEDIFF(dd, convert(char(8), dateint,112),@cdate) as DDiff
From @BadDateFormatTable)

Select DateInt
From DateCte
Where DDiff between 1 and 3;

--Numbers 1 and 3 seem to be just about the same query after reviewing
(edit: Punctuation)

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-11-27 : 04:12:51
did you try looking into execution times and plan for each?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2010-11-27 : 04:27:39
Hi Vis,

I'm not really understanding what the execution plans are saying; I've read a few columns on execution plans, but it's hard for me to understand as I don't know what "good" looks like.

This is what I got when I set the statistics time on:

Test 1 cpu 0ms elapsed 147ms
Test 2 cpu 16ms elapsed 166ms
Test 3 cpu 31ms elapsed 162ms
Test 4 cpu 0ms elapsed 148ms

edit: because the original query has so little data and there were literally no times responding from statistics on, I compared to the actual table and set the "between" searches for "Between 1 and 5000" rather than 1 and 3. That's where the stats times came from. Sorry if confusing.
Go to Top of Page

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2010-11-27 : 06:07:04
Your test isn't a good model. The main *reason* that you don't want to use functions over columns in the WHERE clause is that it stops you using an index on that column.

My advice:

1) build a test table (a permanent one) with a large selection of data.
2) Put indexes on that table that mirror your production setup
3) start testing.

statistics time on is not what you should be looking at first. The metrics you want to look at are:

1) Execution plan - "show actual execution plan" in the query tooltip. you want to look for "scans" in the execution plan on columns that you have an index on that you want to use. If you were using the index you would see "seeks"
2) Statistics IO - with a lot of test data you should get some meaningful data on reads / writes.

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-11-27 : 07:13:26
Do you have to use DATEDIFF?

Given that your dates are in yyyymmdd and thus sort in chronological order, can you just do

WHERE DateInt >= @MyStartDateInt AND DateInt < @MyStopDateInt

Failing that I think you need a "cheaper" conversion to DATETIME datatype than going via CHAR. And all arithmetic formula will be quicker
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2010-11-27 : 12:42:26
Kirsten and Charlie, thanks for the feedback. I'll try to answer each of you guys separately:

Charlie: I did run this against the "actual" permanent table; there are approximately 7k records which is where the resulting testing times came from. I figured it was easier to show using a short example with a table variable. As for the execution plan, yes, I realize that's a great tool, I just don't get it. I am having a hard time interpreting the results.

Kirsten: In the example I gave, yes, I do need to use datediff because it's the foundation for a report in another interface where I can't use user defined parameters. It runs automatically against getdate() each week. But the larger question is that I'm trying to avoid functions in where clause in a large amount of queries. This was just one example.

I guess I was hoping to find a better approach (ie, wrapping it in a CTE as a "select" and then putting the resulting computed column into the where clause etc) I have queries that use many other functions in the Where clause and I just wanted to see if putting it into a different place in the query would have any positive performance effect.

Thanks for your help...any further feedback is very greatly appreciated!
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-11-27 : 15:05:05
"It runs automatically against getdate() each week"

Use GetDate to calculate a Min / Max YYYYMMDD integer value and use those as the Range for your DateInt column?

If not clear ask me to clarify pls
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2010-11-27 : 15:52:18
Hi Kristen,

If you don't mind clarifying, that would be great. What would the Min/Max be used against? Do you have an example?
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-11-27 : 16:11:29
"I do need to use datediff because it's the foundation for a report in another interface where I can't use user defined parameters. It runs automatically against getdate() each week."

So ... you want, say, the previous 7 days and perhaps you are using DATEDIFF to calculate if DateInt is within 7 days of GetDate()?

WHERE DATEDIFF(Day, DateInt, GetDate()) < 7

something like that?

Instead you could do

DECLARE @dtStart int,
@dtStop int

SELECT @dtStart = CONVERT(int, CONVERT(char(8), DATEADD(Day, -7, GetDate()), 112)),
@dtStop = CONVERT(int, CONVERT(char(8), GetDate(), 112))

SELECT Col1, Col2, ...
FROM MyTable
WHERE DateInt >= @dtStart
AND DateInt <= @dtStop
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2010-11-27 : 16:38:03
Oh, very cool! Do you see any harm in running a bit differently? I can't declare variable in the reports I need to publish, so I need to work around using derived tables (can use CTE's in one of the environments, but not in all, so derived tables are my fall back on a lot of this) :


select A.dateint
From @BadDateFormatTable A
Join (
SELECT StartDt = CONVERT(int, CONVERT(char(8), DATEADD(Day, -7, GetDate()), 112)),
EndDt = CONVERT(int, CONVERT(char(8), GetDate(), 112)) ) b on A.DateInt between b.StartDt and b.EndDt

Also, as I've noted, I have a ton of other types of functions that I need to try to pull out of the where clause. Do you think this approach is generally going to work to optimize?

For example, if I have to get the datepart for a query:

Select Something
From mytable
Where datepart(dw, IntDateCol)=7

to return data only for Saturdays. I'm thinking after looking at your example that I could do something like in a simplified format:

Declare @date int
Declare @dw int
Select @date = 20101127

select @dw=convert(int, datepart(dw, convert(char(8), @date,112)))

select @date
where @dw=7

(again, I'll need to convert this to a derived table so I can use it)
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-11-28 : 02:36:16
which reporting tool you're using? why cant you use variables? why not then wrap everything into a procedure and call it from report?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2010-11-28 : 02:50:40
Hi Vis,

The software we use has a user interface similar to Query Analyzer, but it's limited in its functions. All statements must start with SELECT.

I could create a procedure but I'll be losing access to SQL Server for this application soon and thus will not be able to do anything more with it. I'm trying to confine all my query writing in this environment to things I"ll be able to work with as our MIS department is taking access away but they will not be providing any support either.

Bottom line is that everything has to be wrapped up in a select statement of some sort. I'm lucky that the sister product which is very similar at least allows Common Table Expressions; however, this one does not.

Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-11-28 : 03:34:27
" I need to work around using derived tables"

I doubt you even need a derived table, this should be fine:

WHERE DateInt >= CONVERT(int, CONVERT(char(8), DATEADD(Day, -7, GetDate()), 112))

I am pretty sure that SQL will only evaluate the right-hand-side once and then use it as a CONSTANT in the query plan.

"The software we use has a user interface similar to Query Analyzer, but it's limited in its functions. All statements must start with SELECT"

How do purchasing decisions for such things ever get passed? The wining-and-dining route I expect
Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2010-11-28 : 03:46:00
Hey Kristen,

couple things...that sounds great...I didn't realize that the function only eats at performance if it's on the "left side" of the operator. So that's good to know.
As for the purchasing decision, believe it or not, the software we use is actually very good for what it does. I have a love/hate relationship with the company (and we purchased it long before I was in my current position). The issue really is that the company is enormous, has swallowed up tons of smaller companies and has really had issues with keeping code consistent (tons and tons of legacy columns that cause problems etc) Anyway, they were running into security issues years ago with users mucking up their databases with the query writer, so they limited it to select statements. Technically, the on-site technical admin person can use SQL server with all its bells and whistles. Unfortunately, that's more of an internal thing, the municipality I work for is basically shutting anything they can down without providing the help we need to do our jobs. So I have to do lots and lots of workarounds :)

Thanks for your help!
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-11-28 : 14:00:43
"I didn't realize that the function only eats at performance if it's on the "left side" of the operator"

You have probably got it, but its not "left side", it is whether the function acts on a column in the tables - i.e. something that varies from row to row. Anything that is a Constant, for the duration of the query, should be only calculated once (unless you manage to fool the Query Planner into not realising that it won't change!)

If you do

WHERE MyColumn > 'xxx'

then SQL can any available Index on MyColumn

Likewise if you do

WHERE MyColumn > 'xxx' + 'yyy'
or
WHERE 'xxx' + 'yyy' < MyColumn

but if you do

WHERE SomeFunction(MyColumn) > 'xxx'

then indexes are off the menu! (SQL may be smart enough to use an index in some cases, I don't know, but I avoid this scenario)
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2010-11-28 : 14:39:07
non sargable

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx


Want to help yourself?

http://msdn.microsoft.com/en-us/library/ms130214.aspx

http://weblogs.sqlteam.com/brettk/

http://brettkaiser.blogspot.com/


Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2010-11-28 : 15:57:08
Ok Kristen, I think I got it. So, going back to your earlier example:

WHERE DateInt >= CONVERT(int, CONVERT(char(8), DATEADD(Day, -7, GetDate()), 112))
, if I were to change getdate to a particular column in the table such as:
WHERE DateInt >= CONVERT(int, CONVERT(char(8), DATEADD(Day, -7, SomeDateField), 112))

Then it would likely have the same effect of negating SQL Server's ability to run an index against that column? (the somedatefield)

Also, it's looking like I probably have to figure out a different way to run searches using functions. For example:


Where Datepart(dw, mydatefield)=2

My question is, if I take the function out of the where clause and put it somewhere else like in the select statement of a derived table, then use the end result for the where search, is that getting around the where clause limitation?

EX:

Select Datepart(dw, t1.datefield), Col2
From Table1 T1
JOIN (Select Datepart(dw, DateField) DF2
From Table2) T2 on T2.DF2=Datepart(dw, t1.datefield)

Is putting the function into the ON clause the same as putting a function into the WHere clause? LIkewise:

Select Datepart(dw, t1.datefield), Col2
From Table1 T1
CROSS JOIN (Select Datepart(dw, DateField) DF2
From Table2) T2
WHERE T2.DF2=7

Is the fact that I'm using the product of the datepart function in the derived table in where clause of the whole query the same effect?

Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-11-29 : 02:15:56
"non sargable"

That's the word, thanks Brett!

"Where Datepart(dw, mydatefield)=2"

What period of time is this over? If it is 4 weeks, say, you could just test against 4 specific date values

Where mydatefield IN (20101129, 20101122, 20101115, , 20101108)

(each of which you could calculate from GetDate() and should be treated as Constant values in the query)

"Is putting the function into the ON clause the same as putting a function into the WHere clause?"

Yes, SQL won't use an INDEX for the JOIN in that circumstance

But you are on the right track, if you can get the actual dates for the day-of-week into a derived table (or just an "IN" as I mentioned above) that should help.

Go to Top of Page

flamblaster
Constraint Violating Yak Guru

384 Posts

Posted - 2010-11-29 : 02:47:38
Kristen, thanks for sticking with me...this is helping a lot! Ok, so the date range would probably be impractical for the majority of the things I'm trying to run to use an "IN" clause. I have to run reports to see "every Monday (or other day of week)" over a 2 - 3 year period sometimes to look for trends etc.

So probably something like I mentioned earlier such as:

Select Datepart(dw, t1.datefield), Col2
From Table1 T1
CROSS JOIN (Select Datepart(dw, DateField) DF2
From Table2) T2
WHERE T2.DF2=7

would be the best route for the environment I work in. I've also seem some things on using functions in cross apply in order to get around the performance issues, or am I mucking that up?

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-11-29 : 04:22:18
quote:
Originally posted by flamblaster

The program that runs on the database I use has all the dates in int datatype (yyyymmdd) and I can't change it.

1. Fire the designer who decided for this all-in-the-world-reasons approach
2. Add a calculated column with this piece of code
ALTER TABLE	BadDateFormatTable
ADD MyDate AS (CAST(CAST(DateInt AS CHAR(8)) AS DATETIME))

3. Now, with a correct datatype, you can do whatever calculations you want.

For more information about DATEDIFF, see here http://www.sqlteam.com/article/datediff-function-demystified


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-11-29 : 07:04:09
Is that calculated column maintained as data in the row then? I thought it was just calculated at "use"??

But, yeah, there is no excuse for storing a date in a non-date column, that's clear to see from the problems it is causing for flamblaster - let alone all the others I can think of.

I wonder if there is any invalid yyyymmdd values in that INT column?

" I have to run reports to see "every Monday (or other day of week)" over a 2 - 3 year period sometimes"

You could use a Tally Table to generate those in the derived table. I haven't got time at the mo to explain that, hopefully Google or someone else can. Basically you need a sequence of numbers 0, 7, 14, ... which you add to your report starting date to get the day-of-week values, and then you can JOIN that to your yyyymmdd INT column

Changing the crappy database design would be better though ...
Go to Top of Page
    Next Page

- Advertisement -