Transaction committed - yes and no

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

SQL Server Administration (2005)

Transaction committed - yes and no

Author

Topic

johny_eu
Starting Member

12 Posts

Posted - 2010-08-19 : 18:52:51

Hi guys,

couple of days ago I've encountered disturbing issue. Transaction was successfully committed, but it was not really written to database. By that I mean there's no record in my table, I call it TransactionLogQuery. Identity seed in that table was used, I see that because next transaction which was successfully committed and written so now I have record's in TransactionLogQuery with ID's 1, 3 where's ID 2 missing.

I'm sure that it was successfully committed because there was no error in database, error in event viewer nor error in application's log. Also, I have proof from client that he successfully executed transaction, but it's not logged in my database.

Here's the short description of application and environment:

Database is MSSQL 2005 Standard with synchronized mirroring and Symantec BackupExec software. Database client's are Java-based applications which uses latest Microsoft JDBC driver.

Main stored procedure is "ProcessRequest" which looks basically like this:


begin try
  begin transaction
  set lock_timeout 20000
  update Warehouse ...
  if @id is null raiserror ...
  insert into TransactionLogQuery ...
  commit transaction
end try
begin catch
  if @@trancount > 0 
    rollback transaction
  insert into ErrorLog ...
end catch
return @response

Any idea why this happened? It troubles me that transaction was committed for sure but not written. If there was any problem regarding commit it should rollback :(

The application is in production for 1 year now and it processed more than 10 000 000 requests and this never scenario never happened.

Kind regards

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-08-19 : 19:43:16

If the transaction was committed, then it would be in the database. If it did commit and the id=2 is missing, then someone deleted it.

The only way to have the gaps is for a rollback or a delete. Well unless someone is modifying the id values, which is unlikely.

Could you post your actual code, so that we can ensure there is no transaction bug in it?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

johny_eu
Starting Member

12 Posts

Posted - 2010-08-19 : 20:47:31

Hello tkizer,

thank you for quick reply.

It's not possible that someone deleted it, I can guarantee that. Also, no one is modifying ID's in that table.

At this moment I cannot give full copy-paste of the stored procedure, but here's more information:


set nocount off
select @timeouttime = dateadd(ss, 20, getdate())
begin try
	set rowcount 1

	select @merchant_id = merchant_id
	from client 
	where name = @name

	if @merchant_id is null 
		raiserror('no such service...')

	set rowcount 0

	begin transaction

	set lock_timeout 20000

	-- retrieve one record from warehouse
	-- be careful about the serial order
	update warehouse
	set 
		@stock_id = wh.id, 
		[status] = 1
	from warehouse wh 
	inner join (
		select top (1) id
		from warehouse 
		where 
			merchant_id = @merchant_id and
			[status] = 0
		order by serial
	) w on w.id = wh.id
	where [status] = 0

	if @stock_id is null
		raiserror('no stock...')

	-- log the transaction
	insert into transactionlogquery
		(
			[name],
			merchant_id,
			stock_id,
			created
		)
	values 
		(
			@name,
			@merchant_id,
			@stock_id,
			getdate()
		)

	if @@rowcount <> 1 
		raiserror('error writing log...')

	if getdate() > @timeouttime 
	begin
		raiserror('timeout')
	end

	commit transaction
end try
begin catch
	if @@trancount > 0 
		rollback transaction
	insert into ErrorLog 
		(
			[name],
			created
		)
	values
		(
			@name,
			getdate()
		)
end catch
	set rowcount 0
	select @stock_id stock_id

Rollback also did not occurred because two client's successfully retrieved the same stock_id. Application successfully called stored procedure and executed it. There was no error in application nor in database table ErrorLog. The problem is that first request wasn't logged, but second was. Although first request took identity seed. Also rollback did not occur, the application would detect such case and do a proper log entry.

Help :(

Kind regards

quote:
Originally posted by tkizer

If the transaction was committed, then it would be in the database. If it did commit and the id=2 is missing, then someone deleted it.

The only way to have the gaps is for a rollback or a delete. Well unless someone is modifying the id values, which is unlikely.

Could you post your actual code, so that we can ensure there is no transaction bug in it?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

Kristen
Test

22859 Posts

Posted - 2010-08-20 : 01:55:08

"the application would detect such case and do a proper log entry. "

Unless that was itself rolled back, or was itself in error - an error in your CATCH block would not be logged - hard to say whether that is possible, and whether such an error would show up on the client's application (e.g. as some sort of error that a user MIGHT! remember seeing)

You are dealing with something you don't expect (and if you are telling me that COMMIT failed then I'm starting from a position of 99.99999% not believing you and 0.00001% chance of a Microsoft bug!)

So you should broaden your mind and not say "X did not happen". Something must (99.99999%) have happened. User may have made a mistake in what they remember. Error Log may have failed. Someone may have hacked into the system. Someone with access may have changed something (and either has not been asked yet, or is not prepared to admit it)

"At this moment I cannot give full copy-paste of the stored procedure"#

If we are looking at some made up code that represents the actual code then more than likely we are wasting our time. The devil will be in the detail.

If you have a problem with confidentiality around the code then do Find & Replace on columns / passwords / whatever

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2010-08-20 : 05:24:35

is there a possibility some trigger "tidied up" the data after it was committed?

johny_eu
Starting Member

12 Posts

Posted - 2010-08-20 : 05:52:41

AndrewMurphy, thanks for joining. No, there's no trigger in database.

quote:
Originally posted by AndrewMurphy

is there a possibility some trigger "tidied up" the data after it was committed?

johny_eu
Starting Member

12 Posts

Posted - 2010-08-20 : 06:30:06

Kristen, thanks for the reply. Don't get me wrong, I'm not saying that this is bug in SQL Server, I'm just trying to get advice from someone who's very familiar with SQL Server product.

Log error failing is unlikely because there's two types of logging. One is in SQL catch block and another is in application. One of these logs should contain entry about it.

Ok, so, the scenario would be something like this. If identity seed was used that means that insert into TransactionLogQuery was executed and failure was beyond that, but before commit transaction. Also there must was failure in catch block which caused no-resetting of variables that are returned to application and no-inserting into ErrorLog table. Application retrieved proper response in variables...

quote:
Originally posted by Kristen

"the application would detect such case and do a proper log entry. "

Unless that was itself rolled back, or was itself in error - an error in your CATCH block would not be logged - hard to say whether that is possible, and whether such an error would show up on the client's application (e.g. as some sort of error that a user MIGHT! remember seeing)

You are dealing with something you don't expect (and if you are telling me that COMMIT failed then I'm starting from a position of 99.99999% not believing you and 0.00001% chance of a Microsoft bug!)

So you should broaden your mind and not say "X did not happen". Something must (99.99999%) have happened. User may have made a mistake in what they remember. Error Log may have failed. Someone may have hacked into the system. Someone with access may have changed something (and either has not been asked yet, or is not prepared to admit it)

"At this moment I cannot give full copy-paste of the stored procedure"#

If we are looking at some made up code that represents the actual code then more than likely we are wasting our time. The devil will be in the detail.

If you have a problem with confidentiality around the code then do Find & Replace on columns / passwords / whatever

Kristen
Test

22859 Posts

Posted - 2010-08-20 : 07:50:46

"Also there must was failure in catch block which caused no-resetting of variables that are returned to application and no-inserting into ErrorLog table"

I can't verify that as you haven't posted the code.

My code is 100% bullet-proof - like you describe yours. We have error logs at every level - all sorts of stuff to be able to reverse engineer what caused a problem (our sites are all for web customers, so we have 0% chance of asking a user "what happened")

I still mistrust them and assume we goofed up somewhere when anything goes wrong.

Clearly something has gone wrong ... but without seeing your code its not possible to give you second-pair-of-eyes for anything that might occur to me / others reading here.

You could restore to point-in-time and restore-forwards over the time in question to see what appears - tedious job, and I'm sure what your error logs are design to avoid you having to do. A Log Reader/Decoder might help with that.

I suppose if no triggers you don't have an "audit table" connected to this that would log any insert/update/delete? (benefit of that is that it is within the transaction, so harder to have a failure that does not also log in the audit table, and requires quite a lot more cover-your-tracks for a hacker)

johny_eu
Starting Member

12 Posts

Posted - 2010-08-20 : 20:36:38

This is the stored procedure:

http://pastebin.com/2EswarKZ

Yes, I'm not using triggers. I have audit log, but that's for user actions.

quote:
Originally posted by Kristen

"Also there must was failure in catch block which caused no-resetting of variables that are returned to application and no-inserting into ErrorLog table"

I can't verify that as you haven't posted the code.

My code is 100% bullet-proof - like you describe yours. We have error logs at every level - all sorts of stuff to be able to reverse engineer what caused a problem (our sites are all for web customers, so we have 0% chance of asking a user "what happened")

I still mistrust them and assume we goofed up somewhere when anything goes wrong.

Clearly something has gone wrong ... but without seeing your code its not possible to give you second-pair-of-eyes for anything that might occur to me / others reading here.

You could restore to point-in-time and restore-forwards over the time in question to see what appears - tedious job, and I'm sure what your error logs are design to avoid you having to do. A Log Reader/Decoder might help with that.

I suppose if no triggers you don't have an "audit table" connected to this that would log any insert/update/delete? (benefit of that is that it is within the transaction, so harder to have a failure that does not also log in the audit table, and requires quite a lot more cover-your-tracks for a hacker)

Kristen
Test

22859 Posts

Posted - 2010-08-21 : 03:26:38

IMHO a scary amount of use of NOLOCK The potential issues with dirty reads when data is then rolled back and page splits etc. would worry me. You are clearly guarding against getting a row that is subsequently not there, but I think the possible effects are hard to judge. You are even using NOLOCK on T14 on a record you have just inserted. Given that you have use the IF from SCOPE_IDENTITY() I presume you will only find your own record, and I also presume that any other record inserted by another user would have incremented the IDENTITY, so hard to see how it could fail - but you have no @@ROWCOUNT check on that SELECT and it suggests that perhaps NOLOCK is being used as a matter of course without explicit consideration, on every single usage, as to the potential side effects.

Have you considered READ_COMMITTED_SNAPSHOT instead?

Are you using NOLOCK in the routines where the users say they saw the record? If so they may have seen it and then it was rolled back.

Probably not relevant, and the code is consistent, but there are no checks on @@ROWCOUNT after UPDATEs to dbo.t14 - although there is a check for @v33 IS NULL which perhaps comes to the same thing (looks like it covers all three execution paths)

There is no error handling on the INSERT INTO dbo.T18 in the CATCH block. An error there might not be caught be another Sproc (that called this one) or, potentially, by your application. Might any of the dbo.T18 columns be NOT NULL and cause error if a NULL column was attempted to be inserted?

The statements in the CATCH block are all singletons, but "belt & braces" I would move the SET ROWCOUNT 0 to the top (actually, I think I would change the "SET ROWCOUNT n" statements to use "TOP n" instead so that if the SProc "bailed" it wouldn't leave the connection in that state).

Almost certainly irrelevant: you are not resetting the LOCK_TIMEOUT

Just curious why you have


IF getdate() > @timeouttime

and then impose, your own, time-out error? At this point the transaction is successful (albeit that it took longer than "allowed") - curious to know why you then choose to abort it at that point? (rather than logging a "golly-gosh this should not have happened" error, instead, for example)

Summary: Nothing obvious, some pedantic points only, its only the possible side effects of NOLOCK that might create an unexpected, and pretty much "impossible" to reproduce, scenario.

johny_eu
Starting Member

12 Posts

Posted - 2010-08-21 : 09:04:46

Wow, thanks for the critics. I'll go one-by-one.

Yes, I've implemented a lot of NOLOCK over the time. At first, there was no such option, but concurrency is very important in my application so I was 'forced' to implement it (at least I was lead to believe). I haven't considered READ_COMMITTED_SNAPSHOT, wasn't familiar with it. I just quick read article on msdn about it and it seems very interesting since that's new (since mssql 2k5) isolation level which should enhance concurrency for my type of application. I see that it makes heavy use of tempdb so I'm not sure how it will work. This is definitely worth looking it and I'll consider it in my next version. No more blocking while reading/writing and vice-versa :)

The user's use response from this sproc. Application checks if @response is 0 which is set on line 493 at pastebin, so that line was executed. Also, the variables were seen by user so cleaning them obviously did not happen, line 501 at pastebin. This is THE issue :(

INSERT INTO dbo.T18 in the CATCH block should pass because all columns are nullable.

Ok, I agree with 'set rowcount n -> select top n', I'll add this to my TODO list.

Resetting LOCK_TIMEOUT? How do you mean?

Regarding this timeout error, just doing double check. So, you also think this is not necessary? I'll add it to TODO list.

Kristen
Test

22859 Posts

Posted - 2010-08-21 : 10:33:34

"Resetting LOCK_TIMEOUT? How do you mean?"

Your setting will remain current for the remainder of the connection - unless it is changed elsewhere. Thus as you set it for local use, perhaps you should store its original value on entry to your Sproc and set it back to that when you exit. (Default would be -1 I think)

"Regarding this timeout error, just doing double check. So, you also think this is not necessary?"

Here's my take; Transaction completes successfully, BUT it takes longer than the time limit for have defined. You are rolling it back, I think you could just let it continue (it HAS worked successfully, after all) but log the fact that it took longer than expected - you are not expecting that, and maybe it is symptomatic of some other issue - lots of active users / bigger database / indexes or statistics not rebuild recently etc. which you would probably like to look into sooner-rather-than-later

johny_eu
Starting Member

12 Posts

Posted - 2010-08-21 : 17:17:33

I see, ok.

What about the main issue? ghost transaction? any idea? :-(

As I said at the beginning, transaction structure seems correct - by the book. That's why I'm confused about how this could happen or else - how can I stop it from repeating.

Kristen, tkizer, Andrew?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-08-21 : 19:33:23

My only suggestion would be to open a case with Microsoft on this if you think that your code is bulletproof, that the user didn't do something to cause it, or if a delete didn't happen. If it's a SQL bug, then only Microsoft can assist with this.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

johny_eu
Starting Member

12 Posts

Posted - 2010-08-21 : 20:39:51

I see. Is there some direct Microsoft contact for SQL?

quote:
Originally posted by tkizer

My only suggestion would be to open a case with Microsoft on this if you think that your code is bulletproof, that the user didn't do something to cause it, or if a delete didn't happen. If it's a SQL bug, then only Microsoft can assist with this.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2010-08-21 : 20:52:23

Do you have a Premier account or an MSDN license?

I believe the phone number that we use for our premier account is 800-939-3100. But that's from memory, I'd have to look it up when I am back at my work computer. I open cases with MS on a regular basis, so that number could be right.

Support isn't free.

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

johny_eu
Starting Member

12 Posts

Posted - 2010-08-22 : 10:32:41

I have SPLA licence. Thanks.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources