| Author |
Topic |
|
BaggaDonuts
Yak Posting Veteran
52 Posts |
Posted - 2007-10-22 : 18:47:30
|
| I'm using the Database Engine Tuning Advisor to do some performance evaluation on my database. I have one particular table that will potentially have a couple million rows. but this may not occur for a few months to a year from now.when i run the advisor with 1.5 million rows, it recommends that i add an index across all columns in this table (covering index) for an estimated 91% improvement. when i run it with 1500 rows in it, it recommends that i add an index on 2 key columns for an 8% improvement.how should i reconcile this? can i have both and what does that do to performance? |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2007-10-22 : 19:05:41
|
| You shouldn't even worry about a table with 1500 rows in it. The optimizer will often times find it faster to do a table scan/clustered index scan. All an index that covers all columns in a table sounds like a bad idea. To figure out what you should do, load up a test database that will contain the number of rows that you expect in one year. Then plan with the indexes and queries to see what works best.Can you post the more frequent select statements that will occur across this table so that we can make an educated guess on what the indexes should be and you can then compare our suggestions to the advisor's suggestions.As a side note, I almost never use the advisor as it typically does not produce realistic results. We use a performance environment to figure out what to do. In the environment we have tools that can load data and simulate production access.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/ |
 |
|
|
BaggaDonuts
Yak Posting Veteran
52 Posts |
Posted - 2007-10-22 : 19:20:10
|
Thanks Tara. It was much faster after adding the recommended index with 1.5 million rows in it. it's not quite all columns, as you'll see below. here's some more info (forgive the length here):Table Structure of the table that will contain the large number of rows:CREATE TABLE [dbo].[t_AtomVersion]( [AtomVersionId] [bigint] IDENTITY(1,1) NOT NULL, [AtomId] [int] NOT NULL, [StructureVersion] [int] NOT NULL, [ContentVersion] [int] NOT NULL, [CreatedDate] [datetime] NOT NULL CONSTRAINT [DF_t_AtomVersion_CreatedDate] DEFAULT (getdate()), [AtomVersionStatusId] [smallint] NOT NULL, [VersionDescription] [varchar](256) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [AssemblySize] [bigint] NOT NULL CONSTRAINT [DF_t_AtomVersion_FileSize] DEFAULT ((0)), CONSTRAINT [PK_t_AtomVersion] PRIMARY KEY CLUSTERED ( [AtomVersionId] ASC)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY], CONSTRAINT [UC_t_AtomVersion] UNIQUE NONCLUSTERED ( [AtomId] ASC, [StructureVersion] ASC, [ContentVersion] ASC)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]) ON [PRIMARY]GOSET ANSI_PADDING OFFGOUSE [Packaging]GOALTER TABLE [dbo].[t_AtomVersion] WITH CHECK ADD CONSTRAINT [FK_t_AtomVersion_t_Atom] FOREIGN KEY([AtomId])REFERENCES [dbo].[t_Atom] ([AtomId])GOALTER TABLE [dbo].[t_AtomVersion] WITH CHECK ADD CONSTRAINT [FK_t_AtomVersion_t_AtomVersionStatus] FOREIGN KEY([AtomVersionStatusId])REFERENCES [dbo].[t_AtomVersionStatus] ([AtomVersionStatusId]) recommended index by the tuning advisor:CREATE NONCLUSTERED INDEX [IX_t_AtomVersion_CoveringIndex] ON [dbo].[t_AtomVersion] ( [AtomId] ASC, [AtomVersionId] ASC, [StructureVersion] ASC, [AtomVersionStatusId] ASC, [AssemblySize] ASC)INCLUDE ( [ContentVersion]) WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]go Finally, here's a snippet of the stored procedure i'll be running QUITE FREQUENTLY. there may be much better ways of tweaking performance on this query, but that's a separate issue i guess. select A.AtomId, AT.AtomTypeKey, SQ.StructureVersion, max(CV.ContentVersion) as ContentVersion, SQ.AssemblySize from dbo.t_AtomVersion CV with(nolock) inner join ( select AP.AtomId, SV.StructureVersion, SV.AssemblySize from @atomPriority AP inner join ( select AtomId, AssemblySize, max(StructureVersion) as StructureVersion from dbo.t_AtomVersion AV with(nolock) inner join dbo.t_AtomVersionStatus SVS with(nolock) on SVS.AtomVersionStatusId = AV.AtomVersionStatusId where SVS.Status = 'release' group by AtomId, AssemblySize ) SV on AP.AtomId = SV.AtomId ) SQ on SQ.AtomId = CV.AtomId and SQ.StructureVersion = CV.StructureVersion inner join dbo.t_AtomVersionStatus CVS with(nolock) on CVS.AtomVersionStatusId = CV.AtomVersionStatusId inner join dbo.t_Atom A with(nolock) on A.AtomId = SQ.AtomId inner join dbo.t_AtomType AT with(nolock) on AT.AtomTypeId = A.AtomTypeId where CVS.Status = 'release' group by A.AtomId, SQ.StructureVersion, AT.AtomTypeKey, SQ.AssemblySize |
 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2007-10-22 : 19:23:53
|
| Wow. I just don't have the time to work on such a large query without getting paid for it. I just wasn't expecting it to be this large or complex.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/ |
 |
|
|
BaggaDonuts
Yak Posting Veteran
52 Posts |
Posted - 2007-10-22 : 19:30:01
|
| haha yeah i figured that was going to be the response before i posted it :) probably one of the more complex problems i've worked on. thanks anyway. |
 |
|
|
BaggaDonuts
Yak Posting Veteran
52 Posts |
Posted - 2007-10-22 : 19:36:24
|
| to close, after loading the test database with 1.5 million records with the suggested index, it went from taking 4 seconds to run to less than a second. i'll probably just go with the recommended index for the biggest improvement. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-10-23 : 02:47:04
|
| What's with the nested sub select (i haven;t read it carefully). Is that just to match the MAX value in each group? because if so there are better ways of doing that in SQL 2005.And all those NOLOCKs are a REALLY bad idea (or are due to really bad design) on a production system.Kristen |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-10-23 : 03:05:26
|
Why is NOLOCK bad on a production system?I think it depends on the usage of system, right?Anyway, try this kind of SQL Server 2005 querySELECT AtomId, AssemblySize, tructureVersion, ContentVersionFROM ( SELECT av.AtomId, av.AssemblySize, a.StructureVersion, av.ContentVersion, ROW_NUMBER() OVER (PARTITION BY av.AtomID, av.AssemblySize ORDER BY av.StructureVersion DESC) AS StructureVersionRecID ROW_NUMBER() OVER (PARTITION BY av.AtomID, at.AtomTypeKey ORDER BY av.ContentVersion DESC) AS ContentVersionRecID FROM dbo.t_AtomVersion AS av WITH (NOLOCK) INNER JOIN @AtomPriority AS ap ON ap.AtomID = av.AtomID INNER JOIN dbo.t_AtomVersionStatus AS svs WITH (NOLOCK) ON svs.AtomVersionStatusId = av.AtomVersionStatusId INNER JOIN dbo.t_Atom AS a WITH (NOLOCK) ON a.AtomId = av.AtomId INNER JOIN dbo.t_AtomType AS at WITH (NOLOCK) ON at.AtomTypeId = a.AtomTypeId WHERE svs.Status = 'Release' AND ap.StructureVersion = av.StructureVersion ) AS dWHERE StructureVersionRecID = 1 AND ContentVersionRecID = 1 E 12°55'05.25"N 56°04'39.16" |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-10-23 : 03:19:23
|
I am just raising the question because I am currently working with a database where there are about 4800 simultaneous users.All upserts are made during 1 am to 3 am with tablock and no users logged on.Most tables have about 4-5 million records and there are about 200 tables.Adding NOLOCK really made a huge difference for us. E 12°55'05.25"N 56°04'39.16" |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-10-23 : 04:25:45
|
"Why is NOLOCK bad on a production system?I think it depends on the usage of system, right?"Oh for sure you can design it in, but that has never been the case in the systems I have come across and I expect it is as rare as Hen's Teeth in the questions that are asked here.The problem is with reading dirty data, so information is displayed to the user which may not be correct, and which they then act on. Maybe click a link and get an error. They report the error, its unreproducible, no one can explain what happened.or they sell $1,000,000 of stock - that they didn't actually hold.or the index does a page-split and the query fails with error. Again almost impossible to reproduce (and that's assuming that the DEVs actually know what's going on )."All upserts are made during 1 am to 3 am with tablock and no users logged on."Sounds like the database should be READ ONLY the rest of the time, or "set transaction isolation level read uncommitted"I did a search for you , I don't see anyone debunking the "never use" rule, only "relaxing the rule in very specific circumstances":http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=24119http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=77064http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=23386http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=35132http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70297http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=84906#312986Kristen |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-10-23 : 05:25:28
|
Read-only would have been cool.But statistics is not updated when in read-only mode. E 12°55'05.25"N 56°04'39.16" |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-10-23 : 05:29:05
|
| if there are NO updates (by users during their use of the App) then I reckon you needset transaction isolation level read uncommittedor similar.Or put that data in a separate database (i.e. all "read only" data) and access that "uncommitted")Kristen |
 |
|
|
pootle_flump
1064 Posts |
Posted - 2007-10-25 : 11:11:17
|
| The covering index looks right given the query. However, you should also test the impact this has on inserts, updates and deletes - this covering index covers most of the table columns and so should have a bit of an impact on these operations. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-10-26 : 05:53:32
|
| "this covering index covers most of the table columns and so should have a bit of an impact on these operations"But its only going to be considered based on the first key column, thus only when that is suitable and sufficiently "selective", isn't it?Kristen |
 |
|
|
pootle_flump
1064 Posts |
Posted - 2007-10-26 : 06:28:49
|
| The operations I meant were update, delete, insert. Since the index is not that much smaller than the table then he can expect a pretty big hit on data changes. It looks ok to me as far as the query itself goes. For covering indexes the selectivity of the first column is not (always) such a big issue. For this query there were a couple of candidate columns for the leftmost (AtomVersionStatusId, AtomId). Atomid is probably the most sensible but I could imagine the other working ok too depending on the data. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-10-26 : 06:33:05
|
| "For covering indexes the selectivity of the first column is not (always) such a big issue"I was meaning to clarify that first the index has to be used before it can cover anything!There is sometimes a mistaken belief that "putting everything in an index" means that it is useful for all scenarios, rather than having carefully considered indexes for various specific jobs, and then overloading those indexes, as appropriate, so that they better cover the query(s) they assist with.Kristen |
 |
|
|
pootle_flump
1064 Posts |
Posted - 2007-10-26 : 08:52:08
|
| Ah I see and completely agree.I think it would use the covering index no matter how selective the first column, though, so long as it covers the query. The first column would affect just how well it could use the index though. Whatever the selectivity, nci scan < ci scan (... almost always). |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-10-26 : 12:46:23
|
| I don't know well enough to be confident, but my thinking is that first it picks an index that solves the query (most cheaply) and then if that index has covering columns they are used for elsewhere in the query, if useful.I'll do a test when I have a moment. Simple test might be to rearrange that index so something "less attractive" was the first-key, and see if it is still used.Kristen |
 |
|
|
|