Author |
Topic |
fairbro
Starting Member
4 Posts |
Posted - 2010-09-08 : 05:29:56
|
Hi,I've been having some freezing issues on SQL server 2005 SP3 and currently the error I'm trying to troubleshoot is "Login failed for user 'NT AUTHORITY\SYSTEM'. [CLIENT:<local machine>]I'm running w2k8 server x64 on a Proliant DL360G5.I have tried switching the user sql uses and switching back again to network service to get rid of this error which appears to be SQL using an account with incorrect credentials (mismatch between NT Authority\System and what SQL is expecting. http://blogs.msdn.com/b/sql_protocols/archive/2006/02/21/536201.aspx)The server has approx 8 db's mirrored onto it and is SAN attatched and was experiencing freezing. I checked the hardware - no issue.I re-applied the SQL 2005 SP3.I enabled tracing which showed errors related to performance - deadlocks have been noted so I enabled DBCC TRACEON (1204, 1222) and some stack dumps can be seen.I have upgraded to w2k8 sp1 (and re-applied sql 2005 sp3).However the only error in the log files is the one previously mentioned. "Login failed for user 'NT AUTHORITY\SYSTEM'. [CLIENT:<local machine>].I have been using SQLIO to test throughput to the SAN incase the latency or errors are related to this and have also been using iometer.Next steps for my issue are to try and fix the error Login Failed...If this doesn't help I'm going to re-install SQL.If this doesn't help I'm going to probably have to get new switches.Can anyone help me with this issue!Thanks,Willie |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2010-09-08 : 11:42:48
|
The error you are seeing is not related to your freezing issues. That error just means that an attempt was made to connect to SQL Server, but SQL Server did not receive an account to authenticate. You can get that error when you attempt to query a linked server using Windows authentication and Kerberos isn't in place. We use SQL authentication for linked servers as a result. We also see that error when MOM (a MS software monitoring package) doesn't have access to SQL Server and is attempting to login. We resolved the error by giving it access.Could you explain "freezing"? Is the server locked up? Is performance just dreadful? What does PerfMon show for CPU, etc? How about SQL Profiler?Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
|
|
fairbro
Starting Member
4 Posts |
Posted - 2010-09-09 : 05:07:40
|
The server has plenty RAM *16gb (maxmem 14gb), and ample CPU available. If that login error can be ignored then it's most probably the congestion between the server and the SAN which is connected via iSCSI via a cisco 3750 which is setup to run at 100Mb. I was reading that sql average disk read or current disk length should not be greater than 5. The queue length under a number of different scenarios I ran through with SQLIO were much higher than that which was a cause for conern. I also monitored consumption of bandwidth inside the storage which seemed fine...(HP MSA 2020, over 3 disks). However network usage over the switch didn't seem fine and on a 100 Mb link appears to peak on a number of occassions (i.e. running at > 7.2Mb) whilst the higher end tests were running. My boss suggested that a valid test to try and he suggested generating a 400 Gb file, and then using a SQL insert statement then monitoring current disk and average disk queue length which doing a full rebuild/index then a backup. What do you consider to be a valid test which may point towards the bottleneck and you dont think that un-installing \ re-installing sql will help? Have you ever seen a misconfigured mirror/witness setup trigger freezing? See dump of SQL io results below (M:\ is the SAN Operatinoal Database LUN).C:\Program Files (x86)\SQLIO>sqlio -kW -t2 -s120 -dM -o1 -frandom -b32 -BH -LS Testfile.dat sqlio v1.5.SGusing system counter for latency timings, 14318180 counts per second2 threads writing for 120 secs to file M:Testfile.dat using 32KB random IOs enabling multiple I/Os per thread with 1 outstanding buffering set to use hardware disk cache (but not file cache)using current size: 2048 MB for file: M:Testfile.datinitialization doneCUMULATIVE DATA:throughput metrics:IOs/sec: 239.14MBs/sec: 7.47latency metrics:Min_Latency(ms): 4Avg_Latency(ms): 7Max_Latency(ms): 169histogram:ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+%: 0 0 0 0 0 0 7 28 45 18 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0C:\Program Files (x86)\SQLIO>sqlio -kW -t2 -s120 -dM -o2 -frandom -b32 -BH -LS Testfile.dat sqlio v1.5.SGusing system counter for latency timings, 14318180 counts per second2 threads writing for 120 secs to file M:Testfile.dat using 32KB random IOs enabling multiple I/Os per thread with 2 outstanding buffering set to use hardware disk cache (but not file cache)using current size: 2048 MB for file: M:Testfile.datinitialization doneCUMULATIVE DATA:throughput metrics:IOs/sec: 295.20MBs/sec: 9.22latency metrics:Min_Latency(ms): 5Avg_Latency(ms): 13Max_Latency(ms): 1991histogram:ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+%: 0 0 0 0 0 0 0 0 0 2 15 23 24 15 6 1 2 1 2 1 0 1 2 1 2C:\Program Files (x86)\SQLIO>sqlio -kW -t2 -s120 -dM -o4 -frandom -b32 -BH -LS Testfile.dat sqlio v1.5.SGusing system counter for latency timings, 14318180 counts per second2 threads writing for 120 secs to file M:Testfile.dat using 32KB random IOs enabling multiple I/Os per thread with 4 outstanding buffering set to use hardware disk cache (but not file cache)using current size: 2048 MB for file: M:Testfile.datinitialization doneCUMULATIVE DATA:throughput metrics:IOs/sec: 87.63MBs/sec: 2.73latency metrics:Min_Latency(ms): 9Avg_Latency(ms): 90Max_Latency(ms): 1515histogram:ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+%: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 99C:\Program Files (x86)\SQLIO>sqlio -kW -t2 -s120 -dM -o8 -frandom -b32 -BH -LS Testfile.dat sqlio v1.5.SGusing system counter for latency timings, 14318180 counts per second2 threads writing for 120 secs to file M:Testfile.dat using 32KB random IOs enabling multiple I/Os per thread with 8 outstanding buffering set to use hardware disk cache (but not file cache)using current size: 2048 MB for file: M:Testfile.datinitialization doneCUMULATIVE DATA:throughput metrics:IOs/sec: 85.36MBs/sec: 2.66latency metrics:Min_Latency(ms): 8Avg_Latency(ms): 186Max_Latency(ms): 10670histogram:ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+%: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100Thanks, Willie |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2010-09-09 : 11:28:47
|
You haven't explained what you mean by freezing yet. Uninstalling is almost never a solution.Have you run SQL Profiler to check for long-running queries and high reads? If by freezing you mean the CPU is pegged, then it's likely you are missing indexes. Missing indexes can cause huge IO issues. Or if by freezing you mean the server locks up, well then I'd suggest opening a case with Microsoft as you'll need to get a memory dump generated when it occurs and then analyzed by them.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
|
|
fairbro
Starting Member
4 Posts |
Posted - 2010-09-13 : 05:58:02
|
Hi Tara,That's really helpful thanks. When I said freezing I mean that the SQL engine became unresponsive. CPU remained fine, and there is memory available, it seems more likely (to me) that it might be IO related to the disks which are on the SAN (that contain the db's). The servers themselves are pretty new, and under little stress. So for example when trying to create backups of the DB's using backup exec (which tried to read all teh DB's quickly for a snapshot) caused them to freeze, the backups would fail, sql would become unresponsive/crash and the only way to get it back would be to restart SQL server. That's why I started to try and troublshoot with dbcc traces which revealed some stack dumps. Haven't used SQL profiler yet - can you provide some details if you think it'd be helpful to try and troubleshoot this issue? Thanks,Willie |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2010-09-13 : 10:58:27
|
If it's an IO issue, you'd see a perf problem in Performance Monitor. What does avg disk io/sec for read and writes show? It should be under 12ms.For SQL Profiler, I'd look for long-running queries and high reads.If you are seeing stack dumps, then you likely are encountering a big issue that Microsoft will need to help with. What CU are you running for sp3? I'd get on the latest before calling MS as stack dumps can be SQL bugs.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
|
|
fairbro
Starting Member
4 Posts |
Posted - 2010-09-13 : 11:21:36
|
Hi Tara,What do you mean by CU for SP3? I'll check io/sec and see what times they come in under whilst running some SQL i/o tests, or running a backup. I had been checking average disk queue length which was > 100 at some points for current disk queue length and average disk queu length. Thanks, Willie |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
|