Author |
Topic |
sghokie
Starting Member
6 Posts |
Posted - 2015-01-03 : 22:49:12
|
Okay, I am at the end of the rope on this issue.I have 2 SQL servers, one is the main DB and the other has SSRS on it along with a few other DB's. Developers have built this web application that serves us reports in various parts of the display back screens. When SSRS reports are initiated on the SSRS server it connects to the main server for data retrieval. So what happens is that after running for approximately 3 - 4 weeks the server running SSRS will stop authenticating with the domain controller. It used to be a longer period of time, but now that there are more users of the application and reporting services the time between crashes has gone from 6 weeks down to 3 weeks or so. The servers are 2008R2 windows servers and 2008R2 SQL server (currently running sp2, same problem happened under sp1). The Main DB server is a physical machine while the SSRS server is a VM. I have been experiencing this error for over a year. I have 4 environments (dev,test,debug,prod) and all 4 will experience the same problem. Prod will experience the problem much quicker vs the other servers. The server hardware has not changed in quite sometime and never experienced a problem until we started using SSRS.Once the 5719 error condition has occurred the only remedy is to reboot the VM. The server basically is unusable in this state as no further authentication can take place with the DC, so no reports can be executed and no SQL transactions can be initiated. From what I can deduce based on what is happening is that something isn't releasing a socket or port.If I reboot the server periodically the problem will not occur until it has been running for a while.I already opened a MS Tech Support ticket on this issue. They are stumped. They initially had me add a registry setting to the TCPIP section for timedwaitdelay, but it didn't make a difference. I have tried a lot of different things over the last year+ such as forcing kerberos to authenticate by TCP, installing service packs, nothing so far has made any difference.I would really appreciate any thoughts or insight.Thanks,SteveFrom event viewer logError 5719 This computer was not able to set up a secure session with a domain controller in domain DOMAIN due to the following: The RPC server is unavailable. This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator. ADDITIONAL INFO If this computer is a domain controller for the specified domain, it sets up the secure session to the primary domain controller emulator in the specified domain. Otherwise, this computer sets up the secure session to any domain controller in the specified domain. |
|
sz1
Aged Yak Warrior
555 Posts |
Posted - 2015-01-05 : 05:52:23
|
We had similar issues with computer ips being reused by different devices, so the ip in question was being used by a second device as it was re-released, we had to release/renew in that case which fixed the problem computers. Also, making sure the bank of ips were not re-released when already registered with a device in future. Check all ips are corect for computer names, it will work ok for a time but when a second device tries to use the ip you will gte authenticatiion error.We are the creators of our own reality! |
|
|
sghokie
Starting Member
6 Posts |
Posted - 2015-01-05 : 09:59:22
|
Hi, thanks for the reply. On this server network all of the servers are on static NAT IP addresses, we are not doing DHCP leases. |
|
|
sz1
Aged Yak Warrior
555 Posts |
Posted - 2015-01-05 : 11:14:07
|
Thats rules that out then, are there any perodic changes taken place on the server, we noticed when changing configs etc we always had to restart the service for RPC, and you mentioned a system restart sorts the connections, have you tried a restart of the services instead when it fails? also have you looked at your DCOM settings...other than that is there plenty of memory.Also, If Transmission Control Protocol/Internet Protocol (TCP/IP) is the protocol used between the client and server programs, you can use an Lmhosts file to eliminate Windows Internet Naming Service (WINS) as a possible contributor to the problem.We are the creators of our own reality! |
|
|
sghokie
Starting Member
6 Posts |
Posted - 2015-01-05 : 13:04:28
|
Nothing really seems to work once I get the 5719 error. The domain admin guy here says that on the DC it looks like the computer has lost it's domain account, he has tried to add it back in on the DC side, I think as I recall that might fix it but for only a few hours, I can't recall though.One odd thing with it is that for a period of time you can still RDP in and be able to log in without issue, but if the machine has been in the error state for longer than 4 or 5 hours then the only way to deal with it is to log into the VM ware vcenter console and issue a reset. |
|
|
sghokie
Starting Member
6 Posts |
Posted - 2015-01-05 : 14:02:16
|
I have been spending a lot of time thinking about this. It just occurred to me that the server in question is not running SQL server on the default port, but rather a different one, not sure if that would make a difference. My thought is that maybe at some point windows tries to use that port for something else, but can't. I am not sure why the server got installed running SQL Server on a different port, that predates my time. |
|
|
sz1
Aged Yak Warrior
555 Posts |
Posted - 2015-01-06 : 04:55:09
|
If he's having to add it back again thats not good, have you tried removing it from network add it to workgroup then back to network again, if he removes it from AD directly I would then add to workgroup and back to domain to create a new entry, sounds DNS related.Also Im assuming that: SSRS uses port 80 so opening port 80 and port 1433 is required. Ask the network guy to investigate if anything else is using the ports.We are the creators of our own reality! |
|
|
sghokie
Starting Member
6 Posts |
Posted - 2015-01-08 : 15:36:57
|
I think I found the problem. A developer here wrote a windows service app to push reports out. That app is using handles and not releasing them. I have been running a perf trace on that process and it's handle usage is going up by about 1000 handles per day. I restarted the server about 7 days ago and it was up to over 6000 handles on the 7th day. I restarted that service and it's handles went down to 300. |
|
|
sz1
Aged Yak Warrior
555 Posts |
Posted - 2015-01-09 : 07:52:58
|
Good stuff mate.We are the creators of our own reality! |
|
|
sghokie
Starting Member
6 Posts |
Posted - 2015-01-23 : 08:37:13
|
Just putting an update in for anyone else that might be facing the same problem.I think I tracked down the issue to these 3 things.1. our domain account that SQL server was running as did not have permissions on the domain controller to be able to register the SPN of the SQL server which was preventing kerberos authentication.2. there was a service running from dell called equal logic, I found this process was severely leaking handles. Our network guy uninstalled that program.3. the other program that was written in house stopped leaking handles on it's own after fixing problem 1 above, I am not exactly sure why, but it did. I have been logging and monitoring handles for a full week now since the last server reboot and we are not too far off from the total number of handles used. The server will be rebooted again today for patching by the network admin team. After that I will continue to monitor it for the next month or two. Previously we would have crashed between 20 and 45 days. |
|
|
|