Author |
Topic |
MariusD
Starting Member
7 Posts |
Posted - 2012-06-13 : 13:17:06
|
I am trying to troubleshoot erratic query run times on this server. I started by noticing that some sp run between < 1 minute and 4-5 hours at a time. Then, I noticed CXPACKET waits. MAXDOP was set to 2, cost threshold for parallelism set to 5. This is a 4 processor, NUMA based server, running both OLAP and OLTP databases (but no high volume transaction databases). I changed the threshold to 25 and ever since all sp's did behave nicely. However, I am still experiencing queries generated by Cognos which get stuck for hours waiting for CXPACKET. Top ten wait statistics, show CXPACKET waits accounting for 40% overall, everything else < 1% but bear in mind, those numbers are mostly for the old setting, maxdop = 2 and thershold = 5. ( 5 days with new settings and 6 months with old).1) CXPACKET waits are 40% of all waits, seems like that is a problem?2) Should I start by setting parallelism settings to best practices values, that is, maxdop=4 and threshold = 5 (or should I leave it at 25, since it works for teh sp's)?Any suggestions are appreciated.. thanks!!! |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
jackv
Master Smack Fu Yak Hacker
2179 Posts |
Posted - 2012-06-13 : 14:10:34
|
CXPPACKET is normal in a multi core environment. The tkizer observation is a good one re: OLTP\DSS . You've switched the maxdop which has solved some problems , but maybe caused others. Try: 1) identify which queries are running in parallelism and why?Are they frequent?2)OLTP normally works better with lower parallelism setting, consider lowering parallelism - having first identified \ adjsuted (?) some queriesJack Vamvas--------------------http://www.sqlserver-dba.com |
|
|
MariusD
Starting Member
7 Posts |
Posted - 2012-06-13 : 14:41:17
|
quote: Originally posted by tkizer Why do you have both OLAP and OLTP databases on the same server? Generally they do better with different MAXDOP settings.Good question... That's how it was set up long before I started here (3 months ago). I wouldn't say that MAXDOP=4 is the best practice here. Do you have an article that shows that?Here's a reply by Jonathan Kehayias to an article of Dave Pinal, where Dave was suggesting setting MAXDOP to 2 for a mixed environment and cost of threshold to 25.http://blog.sqlauthority.com/2011/02/06/sql-server-cxpacket-parallelism-usual-solution-wait-type-day-6-of-28/reply:CXPACKET waits aren’t necessarily bad, or even a sign of a problem, they are normal and expected if queries are executing using parallelism. I wouldn’t recommend that CXPACKET alone ever be considered a reason to reduce or change the configuration of the ‘max degree of parallelism’ configuration option. Most newer servers are going to be NUMA based systems, and the best practice recommendation for this configuration option is to set it to the number of processor cores in a single NUMA node. Generally speaking this best practice is not followed, and most recommendations show changing the ‘max degree of parallelism’ configuration option to 1 or 2 whenever the topic of CXPACKET comes up. My recommendation, especially on newer NUMA based systems is always to first set it following best practices to the number of physical cores in a single NUMA node and then monitor. You can find the number of schedulers in each NUMA node by querying the online_scheduler_count from sys.dm_os_nodes. Even for data warehouse environments, this should be set initially following this practice unless testing has shown that leaving it at 0 is actually best, which is possible.After making the change monitor not just wait types, but also for how the system is performing. Look for queries that run under parallelism and test them manually using different levels of DOP using the OPTION(MAXDOP n) query hint to see if reducing parallelism actually improves or harms performance. You might find that reducing it for one query improves performance while the rest of the workload shows a performance decrease from that same tested reduction. In that case putting the query hint in, either as a plan guide for the individual query, or by changing the code if you have access, would yield better returns.Raising the cost of threshold to 25 has definitely helped, I don't have any more problems with sp's, only once in a while a Cognos-generated query hangs for hours in a CXPACKET wait. I guess I'll wait to see what teh wait statitics show tomorrow, I have just reset them this morning.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog
|
|
|
MariusD
Starting Member
7 Posts |
Posted - 2012-06-13 : 14:44:02
|
quote: Originally posted by jackv CXPPACKET is normal in a multi core environment. The tkizer observation is a good one re: OLTP\DSS . You've switched the maxdop which has solved some problems , but maybe caused others. No, maxdop is the same, I raised the cost of threshld to 25 and it has helped, but it has not solved the problem entirely.Try: 1) identify which queries are running in parallelism and why?Are they frequent?No, they are not frequent any more, I only caught one yesterday. Identifying them is almost impossible, as those are Cognos-generated (users drag and drop columns in Cognos)2)OLTP normally works better with lower parallelism setting, consider lowering parallelism - having first identified \ adjsuted (?) some queriesJack Vamvas--------------------http://www.sqlserver-dba.com
|
|
|
jackv
Master Smack Fu Yak Hacker
2179 Posts |
Posted - 2012-06-14 : 01:37:30
|
Marius , analyse the cached plans for queries using parallelismJack Vamvas--------------------http://www.sqlserver-dba.com |
|
|
MariusD
Starting Member
7 Posts |
Posted - 2012-06-14 : 11:43:07
|
quote: Originally posted by jackv Marius , analyse the cached plans for queries using parallelismJack Vamvas--------------------http://www.sqlserver-dba.com
Yes, thank you, you are right, but this is very hard to do with Cognos generated queries.I ran the wait statistics query again today and CXPACKET waits amount to 40% of all waits, once again. Since I have not seen anything hung for hours waiting for CXPACKET, I'm starting to think this is not the problem. I will continue to test different queries with different options (hints) for parallelism. Thank you! |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
MariusD
Starting Member
7 Posts |
Posted - 2012-06-15 : 11:11:17
|
No, I didn't, but I wasn't looking. It only happened once since I raised the threshold. I saw this Cognos query stuck waiting for CXPACKET for hours. So you are saying it could be that the thread it was waiting for was blocked by something else? In this case, Activity monitor would still show a CXPACKET wait and not a block by something else? I was looking at Activity monitor, at some point in time I believe this was the only process blocked by itself (that's what "blocked by" was showing). So I thought this a parallelism issue and didn't look any further. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
MariusD
Starting Member
7 Posts |
Posted - 2012-06-16 : 09:33:24
|
I can't be sure about that, I watched it for about 1.5 hours and it did not change. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
jackv
Master Smack Fu Yak Hacker
2179 Posts |
Posted - 2012-06-17 : 02:06:57
|
MariusD - When analysing the statements - try to isolate just the Cognos statements. There must be a way via logon , or application nam eto isolate and identify on a SQL Server Trace.Jack Vamvas--------------------http://www.sqlserver-dba.com |
|
|
MariusD
Starting Member
7 Posts |
Posted - 2012-06-18 : 10:29:37
|
Yes, you are both right - I should try and track the Cognos queries only, I believe I can do that via user code. I am working on setting up a monitoring routine right now. Unfortunately, I cannot separate the 2 databases - the one that's constantly updated is too small to justify a separate server, and everything would have to be re-written for the data warehouse data imports/ updates (SSIS + Framework, about 25 packlages). Today it's been > week with the new threshold setting and other than that 1 query, everything has worked very well (roughly 800 SSIS steps daily, we track all execution times). I've seen other Cognos queries waiting for CXPACKET, but for a reasonable amount of time (minutes). Thanks again for your help! |
|
|
jackv
Master Smack Fu Yak Hacker
2179 Posts |
Posted - 2012-06-18 : 10:44:21
|
Your welcomeJack Vamvas--------------------http://www.sqlserver-dba.com |
|
|
|