Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Replication (2005)
 Replication related issues

Author  Topic 

vicferrada
Starting Member

11 Posts

Posted - 2007-10-15 : 18:02:22
Over the past few days, we've been experiencing some very disturbing issues that may be directly related to our prod database (publisher) replicating to the reports database (subscriber). Both machines run SQL Server 2005 unpatched (I know, I know).

When I run replication monitor, everything seems fine until I go to the Undistributed Commands page and I see a HUGE number there, currently 2 million and growing. I check the net for solutions and I find reference to setting SubscriptionStreams to a number other than default (4 cpus on each machine).

Problem one is that there is no way I can add this (that I could find anyway) to the Distribution agent through its properties.

Problem two is when I try running Activity Monitor on the subscription DB or exec sp_lock, both seem to time out or hang. This leads me to believe something is blocked in the distributor or the report db?

Any suggestions would be greatly appreciated.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2007-10-15 : 18:09:09
Is the log reader job running?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

vicferrada
Starting Member

11 Posts

Posted - 2007-10-15 : 18:31:32
Yes, snapshot and log reader agents running, all jobs on the subscriber server (transactional, pull subscription) succeeded last run outcome, and both replication-related jobs running.

Everything looks normal...
Go to Top of Page

dinakar
Master Smack Fu Yak Hacker

2507 Posts

Posted - 2007-10-15 : 19:05:28
Try stopping the distribution job and restarting it. We've seen that on our end. When you have multiple streams, it tries to apply batches of commands in parallel and if there is a depencency, the command fails and replication agent then switches to single stream and applies the transactions in the original order and then switches back to multiple streams. During this process, the latency builds up suddenly (depending on the load on publisher) and the distribution agent seems like its doing something but doesnt really do anything and the latency keeps on building. I restarted the distribution agent and saw the latency going down and undistributed commands going down. I've no explanation as to why the Distribution agent has to be restarted when the status shows its currently executing.

Dinakar Nethi
************************
Life is short. Enjoy it.
************************
http://weblogs.sqlteam.com/dinakar/
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2007-10-15 : 19:06:14
Have you checked the subscriber to see if any data is being replicated to it?

With 2 million undistributed commands, you should have a fairly large or filled up transaction log file. Can you give the stats on that?

Are there any failed jobs on the distributor? Does Replication Monitor show everything as green?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

vicferrada
Starting Member

11 Posts

Posted - 2007-10-15 : 19:22:13
I'll see about restarting the distribution agent. Also, transactions are being shipped, from log reader agent, X transactions with X commands delivered at a min ago. Publisher to Distributor, same msg. Distributor to Subscriber shows Delivering transactions with action time set to a minute ago.

Transaction logs aren't filled up on either server, 108gb free on the main db logs drive, 80 gb free on the reports server logs drive.
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2007-10-15 : 19:27:01
Your drive might have plenty of space, but how much space is available inside the LDF file and how big is the LDF file?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

vicferrada
Starting Member

11 Posts

Posted - 2007-10-15 : 19:51:30
On Main, the ldf is 20gb with 19gb free.
On Report, the ldf is 5gb with a little over 4 free at the moment.

Both DBs are Full recovery, with full nightly backups. main has differential backups every 3 hours and transaction log backups every hour. reporting has transaction log bacups every 2 hours.

Vic
Go to Top of Page

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2007-10-15 : 22:39:39
Snapshot agent shouldn't run all the time. Did you see any error in distribution agent?
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2007-10-16 : 01:37:06
That's unusual to have so many undistributed commands and not have a more filled up LDF file on the publishing database. The data gets stored there until the rows are replicated. We've run out of space on the publishing database before since replication was breaking. It just kept filling up the transaction log. Backups don't matter in this case since it won't backup these undistributed commands.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/
Go to Top of Page

vicferrada
Starting Member

11 Posts

Posted - 2007-10-16 : 10:12:10
Hi, I believe it was the distribution agent that was not running correctly. Possibly in conjunction with the disk activity also. I waited until activity was reduced to a minimum on production and reset the agent, didn't see anything change at the moment so I went to bed (couldn't get any worse, could it?).

This morning I checked and there are 0 undistributed commands.

Yes a head scratcher. Anyway, I'm pushing for new hardware with lots of drives (the amount of data is getting up there) and proper service pack applications on all database servers. Hopefully, I can avoid more sleepless nights...

Thanks for your suggestions :)
Go to Top of Page
   

- Advertisement -