Distributed Transaction sometimes failing

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

Distributed Transaction sometimes failing

Author

Topic

beatlejuice95
Starting Member

2 Posts

Posted - 2011-12-15 : 12:25:05

Hi all.

I need some help figuring out a problem with a linked server job that is failing intermittently. First off, I will copy in the log output I created from the job:

Msg 7391, Sev 16: The operation could not be performed because the OLE DB provider 'SQLOLEDB' was unable to begin a distributed transaction. [SQLSTATE 42000]
Msg 7312, Sev 16: [SQLSTATE 01000]
Msg 7300, Sev 16: OLE DB error trace [OLE/DB Provider 'SQLOLEDB' ITransactionJoin::JoinTransaction returned 0x8004d00a]. [SQLSTATE 01000] Job 'myjobname' : Step 1, 'stepname' : Began Executing <datetime>

I've google'd the error, and come up with lots of hits, but most are suggesting things I've already tried to no avail.

It's important for me to put in more detail at this time. We have two servers linked over wan, with two large db's. Server A is the source for both DB's, while Server B is the destination. There are two SQLAgent jobs setup to run at different times, which start a process to pull new records from Server A to Server B. One job is to Pull records from DB 1 and put into destination DB 1, and is set to run every 15 minutes. The other job copies data from DB 2 into destination DB 2, and is scheduled to run every 30 minutes.

The job that copies DB 2 records over almost never seems to fail, while the jobs to copy DB 1 records fails consistently during the day everyday. There is no definite pattern to the failures, and I can rule out network based on the fact the other job is running fine during the same time. Perhaps load on the source server (or maybe destination) is the key, but there is still a few more things that might rule that out.

One more thing that is curious, everytime I've tried running the code manually during a time when the job is failing, the job I run manually seems to work. Also, the job failures that report the above error seem to happen quickly, that is, if the job fails, it usually happens in less then 20 seconds.

Successful batches of these copies take at least 20 minutes and sometimes longer to complete. DB 1 is set to copy up to a max of 200 records at a time, while DB 2 is set to copy a max of 250 per batch.

The only theory I have at present is that since both jobs take so long to run, they often must overlap. Is it possible that two jobs both trying to connect to the same server to copy records over could cause a conflict that would end up showing me the error message I described above?

If anyone has any other theories, I would love to hear them as well.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources