Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
agdavidson
Starting Member
17 Posts |
Posted - 2010-01-18 : 16:30:10
|
Recently, we reduced the number of tables that we replicate, to the point where a separate distribution server was no longer necessary. We reconfigured our publishers to act as their own distributors, repurposed the distribution server. After the fact, replication itself is running OK; however, we've started to see job failures when the underlying SP utilizes a cursor. The jobs fail with this error: “Could not complete cursor operation because the table schema changed after the cursor was declared,” despite the fact that no schema changes are being made during job execution. I've found though, that we ONLY get these failures on the hour, when our snapshot agents run. If we move them to run five minutes after the hour, all is well. Any ideas why the snapshot agents are clashing with the cursor SPs? I haven't found anything on google, and while I could reschedule the affected jobs or disable the snapshot agents (I only need to run them manually whenever I've made a schema change to a published table), I don't like this band-aid approach. Please help! Thanks :) |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-01-18 : 16:57:05
|
is this transactional replication? if so, why are you running snapshots?by the way, would need to see the SPs in question to give more specific help. |
|
|
agdavidson
Starting Member
17 Posts |
Posted - 2010-01-18 : 17:30:53
|
Yes, it's transactional replication... sorry should have mentioned that. I inherited this replication topology, and to be honest, thought that's just how they are supposed to run by default... 1x/hour, don't really do any work unless there are new subscriptions to intialize. Is this typically not the case? If so, maybe turning off the snapshot 1x/hour schedules would be the way to go. ?And here's an example of one of the SPs that's failing when it runs concurrently with the snapshot agents...DECLARE @dt_start datetime, @dt_end datetime, @check int, @dt_today datetime, @pk_customer intSET NOCOUNT ON-- Set start and end times to include seven days ago and truncate the timeSET @dt_today = Convert(char(10), GetDate(), 101)SET @dt_start = Convert(char(10), (GetDate() - 21), 101)SET @dt_end = Convert(char(10), (GetDate() - 20), 101)-- Get all mailboxes that have not been closed.DECLARE account_cst CURSOR FAST_FORWARD FOR SELECT pk_customer FROM mailbox m (NOLOCK) JOIN customer c ON c.pk_customer = m.fk_customer WHERE m.dt_on BETWEEN @dt_start AND @dt_end AND m.dt_off IS NULL AND m.fk_product not in(66, 59, 91) and (c.fk_company not in (255, 293))-- Now open the cursor and get the first accountOPEN account_cstFETCH account_cst INTO @pk_customer-- Loop through the accounts from the cursor and enter an auto cst if one is not alread thereWHILE (@@FETCH_STATUS = 0) -- there is a row to get BEGIN -- Check to make sure account does not already has an open cst on the topic SELECT @check = COUNT(*) FROM customer_tracking ct JOIN customer c ON c.pk_customer = ct.fk_customer WHERE c.pk_customer = @pk_customer AND ct.fk_Customer_Tracking_Descriptions = 1985 AND ct.c_status = 'Open' IF @check = 0 -- If there is not an open cst for a 30 day customer call then enter one BEGIN exec mt_auto_cst @pk_customer, 1985, 'Open',@dt_today, 'Please courtesy call customer to verify they received their setup pack and make sure all is well with their system.' , 'Specific','SYSA' END -- get the next account form the cursor list FETCH account_cst INTO @pk_customer END-- Clean up the cursorCLOSE account_cstDEALLOCATE account_cst |
|
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-01-18 : 21:32:58
|
ok, first, the only time you should ever snapshot is if your subscriber comes out of synch with the publisher (replication fails).i don't know what you're doing in the stored proc "mt_auto_cst" but there may be something in there.you should NEVER use nolock hint when updating a replicated table. this is practically guarenteed at some point to cause replication errors (maybe why your predecessor set up the hourly snapshots!). for that matter, you should have a very good reason before using nolock in any instance.i'd like to have a look at the sp you're executing inside the cursor, but fixing the other two may solve your problem. |
|
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-01-18 : 21:44:08
|
More I think about it, the snapshot is causing it.1. Snapshot agent takes locks on the published articles.2. Snapshot agent copies schema of all published articles3. Data pages in memory may not match the pages on disk. Thus the error4. Nolock hint and subsequent update is contributing to #3 (and may be the sole cause of #3) |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
agdavidson
Starting Member
17 Posts |
Posted - 2010-01-19 : 09:43:06
|
Russel and Tara-Thank you both! I probably will disable the snapshot agents per the consensus you came to, but I want to answer your question about sp_auto_cst, Russel, and get clarification on one of your points of feedback before I proceed.As to sp_auto_cst… I know you weren’t able to tell w/o being able to read thru it, but I don’t believe that it’s at play here. It’s a pretty basic insert wrapper, without anything that I think would interfere. I’ve attached the code below so you can review.Now on to your comment about NOLOCK… I agree that it’s bad to use since it could allow dirty reads, etc (again, not something I wrote/implemented), but I’m confused as to how it is extra bad to use on a replicated table like this. All we’re pulling into the cursor is a PK, and even if the table data changed after the cursor PKs were loaded into memory, it shouldn’t cause the “schema change occurred after cursor declared” error, should it? I mean, yeah I might reference a PK that’s since been deleted or what not, but there’s been no actual change to the table’s structure that would constitute a schema modification, right? I’m a junior DBA and may be totally missing something here, so go easy on me if that’s the case :o) --sp_auto_cstDECLARE @c_cst_number char(10) -- Used to hold the cst number,@pk_customer int,@pk_customer_tracking_descriptions int,@i_tracking_number int--Set the action dateif @dt_action IS NULL --if no action date was passed set @dt_action = GetDate() + 3 --then default to 3 days from now--Get customer PKselect @pk_customer = pk_customer from CUSTOMER (NOLOCK)where i_customer_account = Convert(int,@c_customer_account) --Get CTD PKselect top 1 @pk_customer_tracking_descriptions = pk_customer_tracking_descriptionsfrom CUSTOMER_TRACKING_DESCRIPTIONS (NOLOCK)where c_description = @c_cst_call_typeif @pk_customer_tracking_descriptions is null begin raiserror('That descriptor for the auto cst is not found',16,-1) select @c_cst_call_type return end--Get the new cst numberexec sp_new_cst_number @c_cst_number OUTPUTset @i_tracking_number = Convert(int,@c_cst_number)exec mt_Add_Customer_Tracking @pk_customer ,@pk_customer_tracking_descriptions ,@i_tracking_number ,@c_open_closed ,5 ,@c_comment ,'' ,'' ,0 ,@dt_action ,@c_employee ,0 ,@c_open_closed ,'' |
|
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-01-19 : 11:01:07
|
no, you're right. nolock won't cause that error. it can cause "Row not found at the subscriber" and "Duplicate Primary Key" errors. |
|
|
|
|
|
|
|