Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Administration (2000)
 SQL 2000Cluster Services

Author  Topic 

vdavid70
Yak Posting Veteran

67 Posts

Posted - 2005-04-13 : 09:04:40
Hi All,


In microsoft sql 2000, is it possible for a node to switch from an Active/Active configuration to a Active/Passive one. If so, what could cause this and what is the possible solution.

Regard,
V David

bakerjon
Posting Yak Master

145 Posts

Posted - 2005-04-13 : 10:45:56
V David,
Not sure what you mean by this. Do you mean that you have 2 instances of SQL Server in a cluster and both are now running on a single physical node?

If that is the case, there are many reasons why a fail-over would occur, including an OS failure, hardware failure, a reboot of a node, or an admin with a happy mouse finger :)

Please provide more details

Thanks

Jon

Now I know, and knowing is half the battle!
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=48013

Go to Top of Page

vdavid70
Yak Posting Veteran

67 Posts

Posted - 2005-04-13 : 11:24:13
Thanks for the reply Jon,

What i mean is that the nodes are now sharing resources between them. Node1 now has cluster resources working on it , while node2 has SQL server resources working on it. Please help cause i am getting really scared now i am very new to cluster services therefore understand very little how it works.

To add to this sql server is physically installed on both nodes and seem to be working fine.
therefore not sure if they are both running on a single physical node.
Regards
V david
Go to Top of Page

bakerjon
Posting Yak Master

145 Posts

Posted - 2005-04-13 : 12:33:08
If the cluster resources and SQL resources are running on one node, that is fine. You should see no performance degredation. I have had that configuration many times without issue.

Active/Active SQL Clustering means that there are multiple instances of SQL Server (1 default clustered instance, at least one named clustered instance) running in the cluster, and the instances are running on different nodes. If you do not have multiple SQL Server instances, you do not have Active/Active, but rather Active/Passive (though I don't like this terminology due to the confusion it causes).

Check out this link for more info on clustering.
http://www.sql-server-performance.com/clustering_resources.asp

With all SQL cluster configuration, SQL binaries will be installed on all servers in the cluster. If you have multiple instances, multiple sets of binaries will be installed. However, only one server will run a set of binaries at a time.

Can you verify how many SQL Server instances you have? That will help a bit.

Jon




Now I know, and knowing is half the battle!
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=48013

Go to Top of Page

vdavid70
Yak Posting Veteran

67 Posts

Posted - 2005-04-13 : 13:15:51
I think we have a multiple setup, with the first node and sql server name called crm1 and the second node and sql server instance called crm2, both are configured with a virtual name of CRM. Now initially crm2 was handling all the active resources on the node. when it started having problems itt switched over to crm1. since then i think it has switched from one node to the other up to 10 times. While doing this a lot of Cluster resources such as sqlserver services and agent fail and restart it self afet a while.

Now finally the nodes are now sharing the resources with Cluster groups and resources on crm1 and sql server groups and resources on crm2.

I hope this explains the situation a bit more.


thanks for youe effort.
Go to Top of Page

bakerjon
Posting Yak Master

145 Posts

Posted - 2005-04-13 : 13:49:38
If you have 1 virtual name, you have 1 clustered instance. Don't worry. All is well.

Jon

Now I know, and knowing is half the battle!
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=48013

Go to Top of Page

vdavid70
Yak Posting Veteran

67 Posts

Posted - 2005-04-14 : 05:27:47
You are right, To explain the set-up a bit further, I think we have a 2 node active/passive setup, with the first node and sql server name called crm1 and the second node and sql server instance called crm2, both are configured with a virtual name of CRM. The Crm1 and crm2 both share logical drives F and G and a quorum drive E. The data files are installed on the logical drives while the system files are installed on the individual nodes local drives separately.
I hope this explains the situation a bit more.

Now initially crm2 was handling all the active resources on the node. When it started having problems it switched over to crm1. Since then I think it has switched from one node to the other up to 10 times. While doing this a lot of Cluster resources such as sqlserver services and agent fail and restart it self after a while.

Now this morning. All the resources and groups are now back to CRM2, the original primary server. But just before that the nodes have unexpectedly gone on and offline.
Coming up with the former errors it had, such as;

The windows Event viewer comes up with series of errors saying The node failed to join the cluster error code 1717.

Cluster services suffered an unexpected error , fatal error at line
2166 of source module d:/nt/private/cluster/service/dm/dmlogs.Error code 5.

Secondly the node crm1 cannot access any of the logical drives and quorum drives. Is this because all the resources are now on the second node (CRM2).

How do you suggest I tackle this problem? And what are the steps to take to put things right. How do I check if the cluster hardware and services is working well as well as the network connections? At this point will it be wise to take the crm1 node out of the cluster and re apply? And what do I need to make sure is right before doing this?
Go to Top of Page

bakerjon
Posting Yak Master

145 Posts

Posted - 2005-04-14 : 10:18:34
To answer your question about accessing drives that are owned by the active node from passive node, you are correct. Only one node can access the drives at a time. You won't even be able to see the drives in Explorer (unless RDP is haywire :) )

The larger question is "Why does the cluster fail-over without intervention". There could be a myriad of reasons for this. You mention the "nodes have unexpectedly gone on and offline." By this do you mean that the servers are rebooting or they just drop cluster resources? You also mention that "Cluster resources such as sqlserver services and agent fail and restart it self after a while." Does this force a fail-over or is it within the failure threshold?

A place you might want to look for more information is the cluster log. C:\windows\cluster\cluster.log. There will be a bunch of junk in there, but you might find some entries that will point you in the general direction. Another think you want to check is the network connections, specifically the heartbeat. I've seen some pretty strange things with clusters if the heartbeat isn't working properly (or is mysteriously absent/removed). The other thing I would check is that the disk connections are stable. If the server is having a hard time writing to the disk, you might see some other errors like "Write timeout failure" or "Error accessing disk resource...".

HTH

Jon


Now I know, and knowing is half the battle!
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=48013

Go to Top of Page

vdavid70
Yak Posting Veteran

67 Posts

Posted - 2005-04-14 : 11:19:06
No, the servers dont reboot, i guess they just drop cluster resources. when you try to bring it back on line it affects the second node. in effect both servers go offline and after about 3seconds they come back on line.to answer your question about sql server and agent failing and restarting itself, i think it is within the threshhold.

thanks for all these, i am sure by the time you finish with me i will be a guru in clustering.
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2005-04-14 : 11:35:39
active/active and active/passive are really the same thing. active/active means that you have your cluster groups running on both nodes of your 2 node cluster.

For example, you might have the following groups in cluster admin:

1. Cluster group (contains cluster ip, cluster name and quorum disk resoure)
2. MSDTC group (contains MSDTC ip, MSDTC name, MSDTC disk, MSDTC resource)
3. SQL Group 1 (contains SQL ip, SQL name, SQL Server service, SQL Server agent, SQL Fulltext)
4. SQL Group 2 (contains SQL ip, SQL name, SQL Server service, SQL Server agent, SQL Fulltext)

Running all 4 of these groups on your Node A box means you have an active/passive installation. Running groups 1-2 on Node A and groups 3-4 on Node B means you have active/active. In this case it would probably make sense to put group 3 on Node A and group 4 on Node B since they will be using the most resources.

Anyway the bigger problem you are facing is why you are seeing failovers occur. you need to check your system and application event logs to see if you are having a hardware problem that is causing your groups to move from one node to the other. Look at the cluster log (as suggested earlier), search microsoft.com for articles on how to setup your cluster and ensure that you have all the config correct. there is a very good technet article on how to properly setup the heartbeat ethernet connection, I have seen many problems occur becuase this article was not followed verbatim.



-ec

Go to Top of Page
   

- Advertisement -