Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Administration (2000)
 SQL cluster connection timeouts

Author  Topic 

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2003-07-02 : 07:17:09
Philip writes "Hi there,
wondered if anyone can help with starting to troubleshoot our problems. I'm troubleshooting a solution that someone else set up - so I'm looking for things that may have been missed / disregarded in configuration.

Basically we've got a highly available web solution - 2 web machines (wlbs)(ibm netfinities), 2 sql machines (ibm netfinities)on an active-passive failover with a private heartbeat subnet for the sql machines and public subnet to the web front end.

All machines pull data from a public external raid storage (ibm exp200)- ie. both databases and web files are on the same raid1 share (not quorum).

The sql machines are both domain controllers and dns servers. The first dc on the network is not ordinarily the active sql machine in the cluster.

The web machines are wlbs, running asp sites, with some com transactions - com objects are registered separately on each web server. The com user is the 'mts' user in the domain. Sites are monitored using the health monitor component from operations manager.

Initially you should know there is an issue which I'm about to resolve, which is that the 1st sql machine (1st dc on the network) was never set up to connect to a public time server, so windows time service does produce warnings on the 2nd sql machine. It seems to me this may have ramifications for the whole network's resources given dependency of AD's kerberos v5 specification on the service.

Anyway the symptoms are as follows:
We're running several asp sites on the webs, but get sporadic timeout connections to sql - obviously freezing all sites with dynamic content on their default pages.

We're using ADO connections to sql e.g Set objConnection = Server.CreateObject("ADODB.Connection") and the clustered sql db is referenced by name (not IP) in the global.asa of each site.

There are occasionally logs created in the sql logs of the clustered db, however not every time a timeout connection expires - which is 3 or 4 times a day.
Typical last sql log errors were as follows:

Sql details all of loggin the startup and
Using dynamic lock allocation. [2500] Lock Blocks, [5000] Lock Owner Blocks etc, then during the process of Attempting to initialize Distributed Transaction Coordinator but then logs a server error:
Failed to obtain TransactionDispenserInterface: Result Code = 0x8004d01b
then tries to startup the Master db
then
Recovery is checkpointing database 'master' (1)

Other errors include:
Failed to obtain TransactionDispenserInterface: Result Code = 0x8004d01b

Source: spid51
Using 'xpsqlbot.dll' version '2000.80.194' to execute extended stored procedure 'xp_qv'.
Source: spid86
Using 'xpstar.dll' version '2000.80.760' to execute extended stored procedure 'sp_MSgetversion'.

However, as I said, sql does not log enough of these events to match the amount of expiries we're suffering.

Is there anything within the web to sql cluster connections that I should be checking, or is it likely to be solely a sql issue?

Basically all sites can be returned to working order, once the both web machines are rebooted sequentially. Similarly if failover occurs in the sql cluster (which takes 30 seconds to 1 minute to naturally allow transactions to be rolled back and services started on the failover node), both webs need a reboot to recognise the change in control of the db.
"

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2003-07-02 : 11:45:27
You should run SQL Profiler to determine what is occurring.

Why are your database servers domain controllers and DNS servers? Database servers should be dedicated to databases. They should be member servers of a domain and do nothing else but databases.

This part doesn't make sense:
"Similarly if failover occurs in the sql cluster (which takes 30 seconds to 1 minute to naturally allow transactions to be rolled back and services started on the failover node), both webs need a reboot to recognise the change in control of the db."

The web servers should not need to be rebooted to recognize the change in control of the database. What does the connection string look like?

Tara
Go to Top of Page
   

- Advertisement -