Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Optimization strategy?!

Author  Topic 

jordanbk
Starting Member

1 Post

Posted - 2010-09-24 : 17:14:14
Hi folks,

I'm trying to optimize a query written by a previous developer, against a huge table (not my design), and I need some advice.

The table has over 5 million rows, and is around 1200 bytes wide. The query looks at 13 of this table's columns (in either the WHERE clause or SELECT list). However, instead of posting all that code & schema, I will distill this down to a conceptual example.

Let's say you have a poorly-designed table called Widgets, with the following columns: ID, Attrib_A thru Attrib_Z, plus Widget_Date. Keep in mind that Widget_Date can be changed at any time, so it's not a good candidate for a clustered index.

Let's say that the table has 5 million widget rows. And someone wrote a giant query against this table, joining against other tables, but most importantly doing something like:

SELECT w.Attrib_A, w.Attrib_B, w.Attrib_C, w.Attrib_D
FROM Widgets w
JOIN OtherTable o ON (o.ID = w.Attrib_E AND o.Misc = w.Attrib_F)
WHERE w.Attrib_G = 1234
AND w.Attrib_H = @SprocParam1
AND (w.Attrib_I = 'abcd' OR w.Attrib_J = @SprocParam2)
AND w.Attrib_K = @SprocParam3
AND w.Attrib_L is not NULL
AND w.Attrib_M = @SprocParam4
AND w.Attrib_N = @SprocParam5
AND (w.WidgetDate >= @StartDate AND w.WidgetDate <= @EndDate)

So after some research, I find out that the most pertinent columns are WidgetDate, Attrib_K and Attrib_M. That is to say, these columns will narrow down the results more than the other ones. If I were to query on only those 3 columns, I would get a reasonably small number of rows for most use cases (let's say 3000 rows). The other columns will narrow my results slightly further, and are necessary parts of the query, but make less of a difference in the number of rows (once I've narrowed using the first three).

I also notice that there are no composite or covering indices on the Widgets table. There are individual column indexes (e.g. nonclustered index for ID), and no clustered index.

So, my idea is to query on the 3 most pertinent columns, put those IDs into a temp table, then change the main query to inner-join against the temp table when it retrieves the rest of the columns (or uses them in the WHERE clause). Something along these lines:

CREATE NONCLUSTERED INDEX idxWidgetSearch ON Widgets (WidgetDate, Attrib_K, Attrib_M, ID) -- not sure if the order matters?

CREATE TABLE #tmpIDs (ID int)

-- Find the Widget IDs with specified date, K and M values

INSERT #tmpIDs
SELECT w.ID
FROM Widgets
WHERE (w.WidgetDate >= @StartDate AND w.WidgetDate <= @EndDate)
AND w.Attrib_K = @SprocParam3
AND w.Attrib_M = @SprocParam4

-- Join against temp table to retrieve/compare further information on above widgets

SELECT w.Attrib_A, w.Attrib_B, w.Attrib_C, w.Attrib_D
FROM #tmpIDs tmp
JOIN Widgets w ON (w.ID = tmp.ID)

JOIN OtherTable o ON (o.ID = w.Attrib_E AND o.Misc = w.Attrib_F)
WHERE w.Attrib_G = 1234
AND w.Attrib_H = @SprocParam1
AND (w.Attrib_I = 'abcd' OR w.Attrib_J = @SprocParam2)
-- AND w.Attrib_K = @SprocParam3 -- don't need this anymore; see above
AND w.Attrib_L is not NULL
-- AND w.Attrib_M = @SprocParam4 -- don't need this anymore; see above
AND w.Attrib_N = @SprocParam5
-- AND (w.WidgetDate >= @StartDate AND w.WidgetDate <= @EndDate) -- don't need anymore

So now that I've explained the issue and my proposed solution, here are my questions...

In the temp table version of my code, will SQL Server limit the rows it reads from Widgets to only those which are also in #tmpIDs BEFORE it tries to look for all the other parameters in the WHERE clause? (Attrib_G, Attrib_H, etc.) In other words, will it do any sort of table/index scans against a greater set than #tmpIDs, costing me more query run time? Or should the inner join on #tmpIDs prevent that?

Is the above a good strategy for optimizing this query? Or is there another way that I'm not thinking of?

Would it make any sense to create a huge covering index for columns Attrib_A thru Attrib_N, plus WidgetDate? (So far I'm avoiding this because it seems ridiculous.)

I hope this example is clear enough. Any thoughts are welcome.

Thanks for your time,
Jordan

p.s. Difficulty: I don't have the luxury to create/modify table structures or normalize the data at this point.
   

- Advertisement -