Author |
Topic |
vicferrada
Starting Member
11 Posts |
Posted - 2007-10-15 : 18:02:22
|
Over the past few days, we've been experiencing some very disturbing issues that may be directly related to our prod database (publisher) replicating to the reports database (subscriber). Both machines run SQL Server 2005 unpatched (I know, I know).When I run replication monitor, everything seems fine until I go to the Undistributed Commands page and I see a HUGE number there, currently 2 million and growing. I check the net for solutions and I find reference to setting SubscriptionStreams to a number other than default (4 cpus on each machine).Problem one is that there is no way I can add this (that I could find anyway) to the Distribution agent through its properties.Problem two is when I try running Activity Monitor on the subscription DB or exec sp_lock, both seem to time out or hang. This leads me to believe something is blocked in the distributor or the report db?Any suggestions would be greatly appreciated. |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2007-10-15 : 18:09:09
|
Is the log reader job running?Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/ |
|
|
vicferrada
Starting Member
11 Posts |
Posted - 2007-10-15 : 18:31:32
|
Yes, snapshot and log reader agents running, all jobs on the subscriber server (transactional, pull subscription) succeeded last run outcome, and both replication-related jobs running.Everything looks normal... |
|
|
dinakar
Master Smack Fu Yak Hacker
2507 Posts |
Posted - 2007-10-15 : 19:05:28
|
Try stopping the distribution job and restarting it. We've seen that on our end. When you have multiple streams, it tries to apply batches of commands in parallel and if there is a depencency, the command fails and replication agent then switches to single stream and applies the transactions in the original order and then switches back to multiple streams. During this process, the latency builds up suddenly (depending on the load on publisher) and the distribution agent seems like its doing something but doesnt really do anything and the latency keeps on building. I restarted the distribution agent and saw the latency going down and undistributed commands going down. I've no explanation as to why the Distribution agent has to be restarted when the status shows its currently executing.Dinakar Nethi************************Life is short. Enjoy it.************************http://weblogs.sqlteam.com/dinakar/ |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2007-10-15 : 19:06:14
|
Have you checked the subscriber to see if any data is being replicated to it? With 2 million undistributed commands, you should have a fairly large or filled up transaction log file. Can you give the stats on that?Are there any failed jobs on the distributor? Does Replication Monitor show everything as green?Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/ |
|
|
vicferrada
Starting Member
11 Posts |
Posted - 2007-10-15 : 19:22:13
|
I'll see about restarting the distribution agent. Also, transactions are being shipped, from log reader agent, X transactions with X commands delivered at a min ago. Publisher to Distributor, same msg. Distributor to Subscriber shows Delivering transactions with action time set to a minute ago.Transaction logs aren't filled up on either server, 108gb free on the main db logs drive, 80 gb free on the reports server logs drive. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2007-10-15 : 19:27:01
|
Your drive might have plenty of space, but how much space is available inside the LDF file and how big is the LDF file?Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/ |
|
|
vicferrada
Starting Member
11 Posts |
Posted - 2007-10-15 : 19:51:30
|
On Main, the ldf is 20gb with 19gb free.On Report, the ldf is 5gb with a little over 4 free at the moment.Both DBs are Full recovery, with full nightly backups. main has differential backups every 3 hours and transaction log backups every hour. reporting has transaction log bacups every 2 hours.Vic |
|
|
rmiao
Master Smack Fu Yak Hacker
7266 Posts |
Posted - 2007-10-15 : 22:39:39
|
Snapshot agent shouldn't run all the time. Did you see any error in distribution agent? |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2007-10-16 : 01:37:06
|
That's unusual to have so many undistributed commands and not have a more filled up LDF file on the publishing database. The data gets stored there until the rows are replicated. We've run out of space on the publishing database before since replication was breaking. It just kept filling up the transaction log. Backups don't matter in this case since it won't backup these undistributed commands.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/ |
|
|
vicferrada
Starting Member
11 Posts |
Posted - 2007-10-16 : 10:12:10
|
Hi, I believe it was the distribution agent that was not running correctly. Possibly in conjunction with the disk activity also. I waited until activity was reduced to a minimum on production and reset the agent, didn't see anything change at the moment so I went to bed (couldn't get any worse, could it?).This morning I checked and there are 0 undistributed commands.Yes a head scratcher. Anyway, I'm pushing for new hardware with lots of drives (the amount of data is getting up there) and proper service pack applications on all database servers. Hopefully, I can avoid more sleepless nights...Thanks for your suggestions :) |
|
|
|