| Author |
Topic |
|
vdavid70
Yak Posting Veteran
67 Posts |
Posted - 2005-04-13 : 09:04:40
|
| Hi All,In microsoft sql 2000, is it possible for a node to switch from an Active/Active configuration to a Active/Passive one. If so, what could cause this and what is the possible solution.Regard,V David |
|
|
bakerjon
Posting Yak Master
145 Posts |
Posted - 2005-04-13 : 10:45:56
|
| V David,Not sure what you mean by this. Do you mean that you have 2 instances of SQL Server in a cluster and both are now running on a single physical node?If that is the case, there are many reasons why a fail-over would occur, including an OS failure, hardware failure, a reboot of a node, or an admin with a happy mouse finger :)Please provide more detailsThanksJonNow I know, and knowing is half the battle!http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=48013 |
 |
|
|
vdavid70
Yak Posting Veteran
67 Posts |
Posted - 2005-04-13 : 11:24:13
|
| Thanks for the reply Jon,What i mean is that the nodes are now sharing resources between them. Node1 now has cluster resources working on it , while node2 has SQL server resources working on it. Please help cause i am getting really scared now i am very new to cluster services therefore understand very little how it works.To add to this sql server is physically installed on both nodes and seem to be working fine. therefore not sure if they are both running on a single physical node.RegardsV david |
 |
|
|
bakerjon
Posting Yak Master
145 Posts |
Posted - 2005-04-13 : 12:33:08
|
| If the cluster resources and SQL resources are running on one node, that is fine. You should see no performance degredation. I have had that configuration many times without issue.Active/Active SQL Clustering means that there are multiple instances of SQL Server (1 default clustered instance, at least one named clustered instance) running in the cluster, and the instances are running on different nodes. If you do not have multiple SQL Server instances, you do not have Active/Active, but rather Active/Passive (though I don't like this terminology due to the confusion it causes). Check out this link for more info on clustering.http://www.sql-server-performance.com/clustering_resources.aspWith all SQL cluster configuration, SQL binaries will be installed on all servers in the cluster. If you have multiple instances, multiple sets of binaries will be installed. However, only one server will run a set of binaries at a time. Can you verify how many SQL Server instances you have? That will help a bit.JonNow I know, and knowing is half the battle!http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=48013 |
 |
|
|
vdavid70
Yak Posting Veteran
67 Posts |
Posted - 2005-04-13 : 13:15:51
|
| I think we have a multiple setup, with the first node and sql server name called crm1 and the second node and sql server instance called crm2, both are configured with a virtual name of CRM. Now initially crm2 was handling all the active resources on the node. when it started having problems itt switched over to crm1. since then i think it has switched from one node to the other up to 10 times. While doing this a lot of Cluster resources such as sqlserver services and agent fail and restart it self afet a while. Now finally the nodes are now sharing the resources with Cluster groups and resources on crm1 and sql server groups and resources on crm2.I hope this explains the situation a bit more. thanks for youe effort. |
 |
|
|
bakerjon
Posting Yak Master
145 Posts |
|
|
vdavid70
Yak Posting Veteran
67 Posts |
Posted - 2005-04-14 : 05:27:47
|
| You are right, To explain the set-up a bit further, I think we have a 2 node active/passive setup, with the first node and sql server name called crm1 and the second node and sql server instance called crm2, both are configured with a virtual name of CRM. The Crm1 and crm2 both share logical drives F and G and a quorum drive E. The data files are installed on the logical drives while the system files are installed on the individual nodes local drives separately. I hope this explains the situation a bit more. Now initially crm2 was handling all the active resources on the node. When it started having problems it switched over to crm1. Since then I think it has switched from one node to the other up to 10 times. While doing this a lot of Cluster resources such as sqlserver services and agent fail and restart it self after a while.Now this morning. All the resources and groups are now back to CRM2, the original primary server. But just before that the nodes have unexpectedly gone on and offline. Coming up with the former errors it had, such as;The windows Event viewer comes up with series of errors saying The node failed to join the cluster error code 1717.Cluster services suffered an unexpected error , fatal error at line 2166 of source module d:/nt/private/cluster/service/dm/dmlogs.Error code 5. Secondly the node crm1 cannot access any of the logical drives and quorum drives. Is this because all the resources are now on the second node (CRM2).How do you suggest I tackle this problem? And what are the steps to take to put things right. How do I check if the cluster hardware and services is working well as well as the network connections? At this point will it be wise to take the crm1 node out of the cluster and re apply? And what do I need to make sure is right before doing this? |
 |
|
|
bakerjon
Posting Yak Master
145 Posts |
Posted - 2005-04-14 : 10:18:34
|
| To answer your question about accessing drives that are owned by the active node from passive node, you are correct. Only one node can access the drives at a time. You won't even be able to see the drives in Explorer (unless RDP is haywire :) )The larger question is "Why does the cluster fail-over without intervention". There could be a myriad of reasons for this. You mention the "nodes have unexpectedly gone on and offline." By this do you mean that the servers are rebooting or they just drop cluster resources? You also mention that "Cluster resources such as sqlserver services and agent fail and restart it self after a while." Does this force a fail-over or is it within the failure threshold?A place you might want to look for more information is the cluster log. C:\windows\cluster\cluster.log. There will be a bunch of junk in there, but you might find some entries that will point you in the general direction. Another think you want to check is the network connections, specifically the heartbeat. I've seen some pretty strange things with clusters if the heartbeat isn't working properly (or is mysteriously absent/removed). The other thing I would check is that the disk connections are stable. If the server is having a hard time writing to the disk, you might see some other errors like "Write timeout failure" or "Error accessing disk resource...".HTHJonNow I know, and knowing is half the battle!http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=48013 |
 |
|
|
vdavid70
Yak Posting Veteran
67 Posts |
Posted - 2005-04-14 : 11:19:06
|
| No, the servers dont reboot, i guess they just drop cluster resources. when you try to bring it back on line it affects the second node. in effect both servers go offline and after about 3seconds they come back on line.to answer your question about sql server and agent failing and restarting itself, i think it is within the threshhold.thanks for all these, i am sure by the time you finish with me i will be a guru in clustering. |
 |
|
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2005-04-14 : 11:35:39
|
| active/active and active/passive are really the same thing. active/active means that you have your cluster groups running on both nodes of your 2 node cluster.For example, you might have the following groups in cluster admin:1. Cluster group (contains cluster ip, cluster name and quorum disk resoure)2. MSDTC group (contains MSDTC ip, MSDTC name, MSDTC disk, MSDTC resource)3. SQL Group 1 (contains SQL ip, SQL name, SQL Server service, SQL Server agent, SQL Fulltext)4. SQL Group 2 (contains SQL ip, SQL name, SQL Server service, SQL Server agent, SQL Fulltext)Running all 4 of these groups on your Node A box means you have an active/passive installation. Running groups 1-2 on Node A and groups 3-4 on Node B means you have active/active. In this case it would probably make sense to put group 3 on Node A and group 4 on Node B since they will be using the most resources.Anyway the bigger problem you are facing is why you are seeing failovers occur. you need to check your system and application event logs to see if you are having a hardware problem that is causing your groups to move from one node to the other. Look at the cluster log (as suggested earlier), search microsoft.com for articles on how to setup your cluster and ensure that you have all the config correct. there is a very good technet article on how to properly setup the heartbeat ethernet connection, I have seen many problems occur becuase this article was not followed verbatim. -ec |
 |
|
|
|