Author |
Topic |
Kristen
Test
22859 Posts |
Posted - 2013-11-25 : 18:10:24
|
Been looking at some SQL in scheduled jobs on a client's site. It is peppered with NOLOCK; client does not currently have Read Committed Snapshot turned on, but Vendors say that that would be fine with their code.NOLOCK worries me as the code that it is in is business critical and I fear that the issues with NOLOCK (namely that it may cause error to be raised (index entry read, but then record is deleted before it is read), or some records included twice, or not at all, due to index page split) could cause erroneous data to be acted on by the business.Question:If I turn on Read Committed Snapshot will any code using NOLOCK in effect be immune from those side effects (because the SELECT will read from the, static??, snapshot instead of from the, dynamically changing, tables?)If so the Vendor won't have to remove the NOLOCK statements, and I won't have to argue with them that the cost should be theirs rather than my client's :) |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2013-11-25 : 18:42:20
|
If you turn on RCSI, it won't have an effect on the NOLOCK queries. RCSI applies to read committed queries.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
Kristen
Test
22859 Posts |
Posted - 2013-11-25 : 19:24:49
|
Thanks Tara. So turn on RCSI, which will most probably solve the concurrency issue that they put NOLOCK in for, and then get them to remove NOLOCK all over the place.Bet they say that we will have to pay to have them remove that ... |
|
|
Kristen
Test
22859 Posts |
Posted - 2013-12-03 : 10:54:32
|
What I think I would like to do [before getting the Vendor to change their application] is something like this:1) Backup databaseLog all user activity for X hours2) Restore databaseReplay the logged activity - and time it.3) Restore databaseChange to Read Committed SnapshotEdit the Logged activity to remove all NOLOCK - Is this even possible??Replay the (edited) logged activity - and time it.Do you reckon any of that is possible? I'm figuring I can capture the activity with SQL Profiler. I don't think they have many/any SProcs, but if they do I can edit those at #3 to remove NOLOCK. For the rest I think its mostly/all dynamic SQL, so I would need to remove NOLOCK from the SQL Profiler captured commands - let's assume I can write a fancy GREP command to safely remove NOLOCKS without an collateral damage, is that actually going to be possible?Thanks |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2013-12-03 : 12:26:20
|
quote: Edit the Logged activity to remove all NOLOCK - Is this even possible??
I am not sure, but it doesn't sound like it would be worth your time. I doubt you are going to see a timing difference between nolock/rcsi, but the benefit of removing nolock is no dirty reads. If it were me, I would not perform this test as the results aren't going to matter. Plus replaying a workload through Profiler is single-threaded, which is different than what the application is doing (most likely). A single-threaded workload is going to perform the same regardless of isolation levels since there aren't any concurrency issues. You could look into using RML, but that's a huge undertaking. Every time I've looked into it, I gave up. If you do want to test it though, create your own workload with nolock and then remove nolock and replay. You don't have to use their code to test the performance difference between nolock/rcsi.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
Kristen
Test
22859 Posts |
Posted - 2013-12-03 : 12:43:11
|
Thanks Tara.Sorry, misled you a bit about the timing. My issue isn't so much timing (although it would be nice to know that RCS is no slower) but whether anything else breaks as a consequence. Before asking Vendor to remove all NOLOCKs from all their code I wanted to make a test to get a handle on what sort of performance difference, if any, there was, plus did we get any side effects. I was also planning to compare (table by table, row by row, column by column) the data from the LIVE database copy with those of the Restore-and-Replay.Client has reported to me that they periodically get duplicates, and data missing, which feels like NOLOCK is misbehaving when it clashes with Index Page Splits, but as you say replaying single threaded isn't going to exercise that - but if might give me a DIFF on the Compare as that would clearly suggest (wouldn't it?) that the NOLOCK was then causing a knock on effect.Whilst the Vendor claims that they don't have any NOLOCKS in "database modification code" I think they are kidding themselves. I see their code doing things likeINSERT INTO SomeWorkTableSELECT Col1, Col2, ...FROM MyTable (NOLOCK)WHERE NOT EXISTS(SELECT * FROM MyOtherTable (NOLOCK) WHERE A = B) and then some behind-the-scenes code is processing rows that it finds in [SomeWorkTable]I think the SQL in the whole application is a crock-of-S, but I would like to get some concrete evidence, somehow, to prove that. All suggestions would be most welcome :) |
|
|
Kristen
Test
22859 Posts |
Posted - 2013-12-03 : 12:44:46
|
quote: Originally posted by tkizer You could look into using RML, but that's a huge undertaking. Every time I've looked into it, I gave up.
You've confirmed my gut feeling about the whole "Replay with NOLOCK removed from their code" scenario.If they have everything in SProcs it would be more easily doable. </Sigh> |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2013-12-03 : 12:48:12
|
Regarding the duplicates, do they have primary key/unique constraints? Missing data...yeah that could be due to a dirty read and then having to rollback the transaction. Maybe their app isn't logging the rollback or notifying the user. If you can capture a load that does include these issues, then replaying it single-threaded should show if it's a nolock issue or not (meaning these issues go away). You wouldn't need to remove the nolock in the replay though. There won't be dirty reads with a replay that's single-threaded, but at least it would prove/disprove the issue.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
Kristen
Test
22859 Posts |
Posted - 2013-12-03 : 12:56:38
|
quote: Originally posted by tkizer Regarding the duplicates, do they have primary key/unique constraints?
That's a good point, thanks. Yes they definitely do, so the duplicate must be from a user's perspective. Perhaps the APP adds a row to SomeWorkTable (no PK on the fields that would define a user-logical PK) which then causes some background process to insert a row in a mainstream data table with IDENTITY or somesuch.I'll ask client to define what THEY mean by "Duplicate" and translate that into what might be going wrong in the DB.quote: If you can capture a load that does include these issues, then replaying it single-threaded should show if it's a nolock issue or not (meaning these issues go away). You wouldn't need to remove the nolock in the replay though. There won't be dirty reads with a replay that's single-threaded, but at least it would prove/disprove the issue.
Good point. That alone will make it worth doing.Is it possible that the Captured statements will run in a different order such that the outcome in the data could be different?I mean: in an application that does NOT use NOLOCK would a Capture and Replay be guaranteed to arrive at the same database state? or are there subtle concurrency issues which could cause the single-threaded replay to be different in some way? (Data not identical at a given point-in-time and thus subsequent queries would have different results, and outcomes) |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2013-12-03 : 13:11:12
|
If the application/database is using readcommitted, then yes. But remember that RCSI is a snapshot of the data at that moment in time. RCSI vs RC could have different outcomes.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
denis_the_thief
Aged Yak Warrior
596 Posts |
Posted - 2013-12-03 : 13:55:18
|
quote: Originally posted by Kristen Thanks Tara. So turn on RCSI, which will most probably solve the concurrency issue that they put NOLOCK in for, and then get them to remove NOLOCK all over the place.Bet they say that we will have to pay to have them remove that ...
The whole scenario is a big issue that has so many sub plots to it. I support Read Committed Snapshot too. We implemented it and I'm glad that we did.In our app, our code was rampant with no locks. It happened because we were getting complaints of deadlocks and the standard fix (this was before my time) was to add no locks. I remember a few years ago GilaMonster stating (in one of my posts on this board [url]http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=139824[/url]): No Lock neither acquires locks nor honors locks. But whenever I discussed this with Developers they thought that although it did not honor locks, it did acquire locks. So essentially, they applied a fix to a problem without understanding it fully.We got in an endless cycle of: Production reports deadlocks on the new code, add the no lock, create new code, Production reports deadlocks on the new code…My intent with implementing Snapshot isolation was to break this cycle. By using RCSI, we should avoid deadlocks in the first place. The whole “Dirty Data” (i.e. reading uncommitted data) thing was secondary. I’d be a fool to say I was content with the possibility of generating and displaying a financial report that contained records which were backed out in a Rollback Transaction. However, there was no single documented or confirmed case, that I was aware of, where this happened in our system.So we went with RCSI about a year ago and I can say there have been no Deadlocks reported since.But I think your idea of timing things before and after is a good idea. Since I represent Development, I wasn’t in a position to do that. We haven’t had any performance concerns or complaints related to RCSI, but unfortunately I can’t say how much overhead that it added. |
|
|
Kristen
Test
22859 Posts |
Posted - 2013-12-03 : 17:13:23
|
quote: Originally posted by tkizer RCSI vs RC could have different outcomes.
Thanks Tara. I'll report back with my findings.quote: Originally posted by denis_the_thief I support Read Committed Snapshot too. We implemented it and I'm glad that we did ...
We did that with our code too, about 3 or 4 years ago. We didn't have any NOLOCKS in the code (well, not as a Band Aid anyway!), we programmed out any deadlocks we got - and they were precious few as we were careful and methodical with our queries anyway.quote: So essentially, they applied a fix to a problem without understanding it fully.
So true ... and I am sure that is what I have found with this Vendor. In the telecon I don;t believe they even begun to understand the consequences of what I was telling them (or knew, and are terrified and pretending its a non-issue - let's hope its not that, my Client will sack them instantly if they get found out ...)quote: However, there was no single documented or confirmed case, that I was aware of, where this happened in our system.
How would you know? The effect is rare, totally unreproducible, how many users would be smart enough to spot that it had happened AND bother to report it? and maybe they made business critical decisions on faulty data but the decision didn't go bad - so no need for a post mortem to discover why ...quote: We haven’t had any performance concerns or complaints related to RCSI, but unfortunately I can’t say how much overhead that it added.
Its a good point, thanks. I need to run the test before I form-up to the Vendor and tell them to change their code. If I can find a fault for sure the cost of the change will be on their dime, otherwise it might have to be on my client's dime. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2013-12-03 : 17:36:37
|
You should see a performance boost for RCSI vs. RC but nothing between RCSI vs. nolock. Well that's if you have enough threads and concurrency to show it. My most critical system has thousands of batches per second, and we saw a dramatic performance boost a few years ago when RCSI was implemented. The only overhead to RCSI is extra tempdb utilization as that's where the version store is. You might need extra storage wherever tempdb is located and make sure the disks are performing well.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
Kristen
Test
22859 Posts |
Posted - 2013-12-03 : 19:34:13
|
Thanks, I'm figuring RCSI wil give same performance as NOLOCK - and the NOLOCK probably get into the code (and the Vendor's company culture!) back in SQL2000 or earlier ... before RCSI existed ... if performance is even-Stevens then there will be no reason for Vendor not to take NOLOCK out :)Thanks for the prompt on TEMPDB location. I'd best check that, odds-on it will be in C:\PROGRAM FILES\MICROSOFT SQL\... </sigh> |
|
|
denis_the_thief
Aged Yak Warrior
596 Posts |
Posted - 2013-12-04 : 10:13:06
|
quote:
quote: However, there was no single documented or confirmed case, that I was aware of, where this happened in our system.
How would you know? The effect is rare, totally unreproducible, how many users would be smart enough to spot that it had happened AND bother to report it? and maybe they made business critical decisions on faulty data but the decision didn't go bad - so no need for a post mortem to discover why ...
I agree with you. But it is business. There was proof that Deadlocks were happening but no proof that dirty data was happening. |
|
|
denis_the_thief
Aged Yak Warrior
596 Posts |
Posted - 2013-12-04 : 10:24:20
|
quote: Originally posted by Kristen Thanks, I'm figuring RCSI will give same performance as NOLOCK
It should be slower since I think it has to copy records to tempdb all the time. But then again if it is insignificantly slower, that would be the same performance. I don't know the performance overhead, maybe 1-5%. |
|
|
Kristen
Test
22859 Posts |
Posted - 2013-12-04 : 12:17:10
|
You are right, by "the same performance" I was meaning from a user's perspective. I'm not expecting the user to notice any slowdown - which I hope will be the case , particularly if your 1-5% is correct If I have TEMPDB on separate physical disks is there a possibility I will actually get some speedup? Or is the snapshot in TEMPDB never reused by a second/third+ concurrent user? |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2013-12-04 : 12:25:13
|
Our standard config is like this, these are mount points from a storage array:F:\ - system stuff such as system databases, MSSQL\* (BINN, LOG, etc)F:\Data - mdf and ndf files for user databasesF:\Log - ldf files for user databasesF:\TempdbData - mdf and ndf files for tempdbF:\TempdbLog - ldf for tempdbOn some systems, we have F:\Data1, F:\Data2, ... Just depends on the needs of that instance.As far as I know, the snapshot is per session/query.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
Kristen
Test
22859 Posts |
Posted - 2013-12-04 : 12:56:40
|
Thanks Tara.Do your "mount points" map to different physical drives? or are F:\Data and F:\Log actually on the same physical media? And if so is there not a risk that a controller failure (say) might write garbage to both the Data and Log files - i.e. a single point of failure?I suspect your disk subsystems are way more posh and sophisticated than my Clients' , but I find that when I ask for "On a different drive" what I get given is just a different virtual partition on what is, actually, the same physical disks - and the IT people think that's OK.Bit like the data centre we used to use; they had specified two different Electricity and Telephone companies for redundancy, but the contractors put the cables from both of them in a single trench so ... when some plonker dug a new hole with a mechanical digger he went straight through both sets of cables |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2013-12-04 : 13:07:24
|
quote: Originally posted by Kristen Thanks Tara.Do your "mount points" map to different physical drives? or are F:\Data and F:\Log actually on the same physical media? And if so is there not a risk that a controller failure (say) might write garbage to both the Data and Log files - i.e. a single point of failure?
They are different disks as far as I know/am told. We have some systems where we also F:\Backup on that same storage array. We are working towards moving those backups to a NAS instead. We've run into a a few issues with that where either the NAS doesn't support 10gige, or some device between the SQL Server and the NAS doesn't support 10gige. Once we can get 10gige for the backups, then we can move off the Backup mount point. We take the risk in the meantime, though we do still have our DR site (mirroring).Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
Kristen
Test
22859 Posts |
Posted - 2013-12-04 : 13:12:07
|
Thanks Tara |
|
|
Next Page
|