Snapshot Agents Causing Job Failures

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Administration (2000)

Snapshot Agents Causing Job Failures

Author

Topic

agdavidson
Starting Member

17 Posts

Posted - 2010-01-18 : 16:30:10

Recently, we reduced the number of tables that we replicate, to the point where a separate distribution server was no longer necessary. We reconfigured our publishers to act as their own distributors, repurposed the distribution server.

After the fact, replication itself is running OK; however, we've started to see job failures when the underlying SP utilizes a cursor. The jobs fail with this error: “Could not complete cursor operation because the table schema changed after the cursor was declared,” despite the fact that no schema changes are being made during job execution. I've found though, that we ONLY get these failures on the hour, when our snapshot agents run. If we move them to run five minutes after the hour, all is well. Any ideas why the snapshot agents are clashing with the cursor SPs?

I haven't found anything on google, and while I could reschedule the affected jobs or disable the snapshot agents (I only need to run them manually whenever I've made a schema change to a published table), I don't like this band-aid approach. Please help! Thanks :)

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-01-18 : 16:57:05

is this transactional replication? if so, why are you running snapshots?

by the way, would need to see the SPs in question to give more specific help.

agdavidson
Starting Member

17 Posts

Posted - 2010-01-18 : 17:30:53

Yes, it's transactional replication... sorry should have mentioned that. I inherited this replication topology, and to be honest, thought that's just how they are supposed to run by default... 1x/hour, don't really do any work unless there are new subscriptions to intialize. Is this typically not the case? If so, maybe turning off the snapshot 1x/hour schedules would be the way to go. ?

And here's an example of one of the SPs that's failing when it runs concurrently with the snapshot agents...

DECLARE
@dt_start datetime,
@dt_end datetime,
@check int,
@dt_today datetime,
@pk_customer int

SET NOCOUNT ON

-- Set start and end times to include seven days ago and truncate the time
SET @dt_today = Convert(char(10), GetDate(), 101)

SET @dt_start = Convert(char(10), (GetDate() - 21), 101)
SET @dt_end = Convert(char(10), (GetDate() - 20), 101)

-- Get all mailboxes that have not been closed.
DECLARE account_cst CURSOR FAST_FORWARD FOR
SELECT pk_customer
FROM mailbox m (NOLOCK)
JOIN customer c ON c.pk_customer = m.fk_customer
WHERE m.dt_on BETWEEN @dt_start AND @dt_end
AND m.dt_off IS NULL
AND m.fk_product not in(66, 59, 91)
and (c.fk_company not in (255, 293))

-- Now open the cursor and get the first account
OPEN account_cst

FETCH account_cst
INTO @pk_customer

-- Loop through the accounts from the cursor and enter an auto cst if one is not alread there
WHILE (@@FETCH_STATUS = 0) -- there is a row to get
BEGIN

-- Check to make sure account does not already has an open cst on the topic
SELECT @check = COUNT(*)
FROM customer_tracking ct
JOIN customer c ON c.pk_customer = ct.fk_customer
WHERE c.pk_customer = @pk_customer
AND ct.fk_Customer_Tracking_Descriptions = 1985
AND ct.c_status = 'Open'

IF @check = 0
-- If there is not an open cst for a 30 day customer call then enter one
BEGIN
exec mt_auto_cst @pk_customer, 1985, 'Open',@dt_today,
'Please courtesy call customer to verify they received their setup pack and make sure all is well with their system.' ,
'Specific','SYSA'
END

-- get the next account form the cursor list
FETCH account_cst
INTO @pk_customer
END

-- Clean up the cursor
CLOSE account_cst
DEALLOCATE account_cst

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-01-18 : 21:32:58

ok, first, the only time you should ever snapshot is if your subscriber comes out of synch with the publisher (replication fails).

i don't know what you're doing in the stored proc "mt_auto_cst" but there may be something in there.

you should NEVER use nolock hint when updating a replicated table. this is practically guarenteed at some point to cause replication errors (maybe why your predecessor set up the hourly snapshots!). for that matter, you should have a very good reason before using nolock in any instance.

i'd like to have a look at the sp you're executing inside the cursor, but fixing the other two may solve your problem.

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-01-18 : 21:44:08

More I think about it, the snapshot is causing it.

1. Snapshot agent takes locks on the published articles.
2. Snapshot agent copies schema of all published articles
3. Data pages in memory may not match the pages on disk. Thus the error
4. Nolock hint and subsequent update is contributing to #3 (and may be the sole cause of #3)

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-01-18 : 21:48:48

We disable the snapshot job.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

"Let's begin with the premise that everything you've done up until this point is wrong."

agdavidson
Starting Member

17 Posts

Posted - 2010-01-19 : 09:43:06

Russel and Tara-

Thank you both! I probably will disable the snapshot agents per the consensus you came to, but I want to answer your question about sp_auto_cst, Russel, and get clarification on one of your points of feedback before I proceed.

As to sp_auto_cst… I know you weren’t able to tell w/o being able to read thru it, but I don’t believe that it’s at play here. It’s a pretty basic insert wrapper, without anything that I think would interfere. I’ve attached the code below so you can review.

Now on to your comment about NOLOCK… I agree that it’s bad to use since it could allow dirty reads, etc (again, not something I wrote/implemented), but I’m confused as to how it is extra bad to use on a replicated table like this. All we’re pulling into the cursor is a PK, and even if the table data changed after the cursor PKs were loaded into memory, it shouldn’t cause the “schema change occurred after cursor declared” error, should it? I mean, yeah I might reference a PK that’s since been deleted or what not, but there’s been no actual change to the table’s structure that would constitute a schema modification, right? I’m a junior DBA and may be totally missing something here, so go easy on me if that’s the case :o)

--sp_auto_cst
DECLARE
@c_cst_number char(10) -- Used to hold the cst number
,@pk_customer int
,@pk_customer_tracking_descriptions int
,@i_tracking_number int

--Set the action date
if @dt_action IS NULL --if no action date was passed
set @dt_action = GetDate() + 3 --then default to 3 days from now

--Get customer PK
select @pk_customer = pk_customer
from CUSTOMER (NOLOCK)
where i_customer_account = Convert(int,@c_customer_account)

--Get CTD PK
select top 1 @pk_customer_tracking_descriptions = pk_customer_tracking_descriptions
from CUSTOMER_TRACKING_DESCRIPTIONS (NOLOCK)
where c_description = @c_cst_call_type

if @pk_customer_tracking_descriptions is null
begin
raiserror('That descriptor for the auto cst is not found',16,-1)
select @c_cst_call_type
return
end

--Get the new cst number
exec sp_new_cst_number
@c_cst_number OUTPUT

set @i_tracking_number = Convert(int,@c_cst_number)

exec mt_Add_Customer_Tracking
@pk_customer
,@pk_customer_tracking_descriptions
,@i_tracking_number
,@c_open_closed
,5
,@c_comment
,''
,''
,0
,@dt_action
,@c_employee
,0
,@c_open_closed
,''

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-01-19 : 11:01:07

no, you're right. nolock won't cause that error. it can cause "Row not found at the subscriber" and "Duplicate Primary Key" errors.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources