Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Administration (2000)
 Help! log backups cause queries to grind to a halt

Author  Topic 

aiken
Aged Yak Warrior

525 Posts

Posted - 2004-07-09 : 17:42:43
Our production system has developed an exceedingly troublesome problem: for about the past day, during transaction log backups, many of our high volume OLTP queries grind to a halt.

The problem may have been going on longer, but with normal system growth we've just gotten to the point where log backups take more than 30 seconds, so that's made the problem really obvious (lots of timeouts in the ASP client app, since most queries use the default 30 second timeout).

The log backups are already happening every 10 minutes; the obvious answer is to make them happen every 5 minutes. But that seems kind of cheesy and doesn't address the underlying issue.

So: is this normal? Is there any particular type of query that's more subject to this issue? What should I be doing? Help!

Thanks
-b

MichaelP
Jedi Yak

2489 Posts

Posted - 2004-07-09 : 18:17:57
What sort of hardware are you running?
Are you backing up to from disk to disk or disk to tape?

Michael

<Yoda>Use the Search page you must. Find the answer you will.</Yoda>
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2004-07-09 : 18:19:22
what does profiler say? do you have blocking occuring? Do you have the data files, logfiles and backup destination all on separate disk? Have you looked at perfmon, specifically the memory counters?

We had a similar problem last year, it was resolved by adding more memory to the server as well as tuning some memory intensive queries. Our situation was caused by memory intensive queries conflicting with our normal backup jobs. When the job kicked off, SQL Server had to relinquish some memory, which caused our bad performance because of the increase in physcial IO.



-ec
Go to Top of Page

aiken
Aged Yak Warrior

525 Posts

Posted - 2004-07-09 : 18:24:37
Sorry, I should have included more info.

The backup is disk to disk. The configuration is three seperate raid arrays (raid 1 for log, raid 5 for data, raid 5 for backup space), on three channels of a single 64 bit/66mhz PCI raid controller. It's not totally ideal, but I think it's pretty decent.

The server has 4GB ram, with SQL server using 3GB.

I have not looked at perfmon yet, but will see what it looks like. I should be looking at the SQL server memory counters, yes? I have been trying to catch it in the act with nr's excellent nrSpidByStatus procedure, but so far haven't managed to.

Thanks
-b
Go to Top of Page

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2004-07-09 : 18:31:44
http://www.sql-server-performance.com/backup_restore_tuning.asp

Tara
Go to Top of Page

MichaelP
Jedi Yak

2489 Posts

Posted - 2004-07-09 : 18:40:45
Yeah, one thing that sticks out to me is this paragraph from that article:

For maximum backup speed, the local array you back up to should be RAID 1 or RAID 10 due to the high percentage of writes going on. A RAID 5 array is not recommended because they don't handle a high percentage of disk writes (which occurs during disk backups) efficiently.

Maybe the problem is that target disk array.

Michael

<Yoda>Use the Search page you must. Find the answer you will.</Yoda>
Go to Top of Page

aiken
Aged Yak Warrior

525 Posts

Posted - 2004-07-09 : 19:31:02
We do have more disk on the way, and part of that will be raid 1 for backup space.

But regardless of disk throughput -- is it at all normal for log backups to somehow block some queries? Because this really doesn't seem like "things go slower during log backups" -- it seems like "some operations cannot be done during log backups; these operations *will not* run while log backup is going on, but the moment the backup comletes, any operations that haven't timed out finish instantly."

Cheers
-b
Go to Top of Page

derrickleggett
Pointy Haired Yak DBA

4184 Posts

Posted - 2004-07-09 : 23:38:39
on three channels of a single 64 bit/66mhz PCI raid controller

You might be saturating that RAID controller. What do your physical disk queue lengths look like when this all goes bad?

MeanOldDBA
derrickleggett@hotmail.com

When life gives you a lemon, fire the DBA.
Go to Top of Page

MichaelP
Jedi Yak

2489 Posts

Posted - 2004-07-10 : 16:30:36
I think Derrick is on the right track here.

Michael

<Yoda>Use the Search page you must. Find the answer you will.</Yoda>
Go to Top of Page

aiken
Aged Yak Warrior

525 Posts

Posted - 2004-07-11 : 16:51:16
Disk queues looked normal -- no logical disk spiked about 10 in the total queue, and none sat at values higher than 1 for any length of time.

I found part of the problem -- the transaction log was just too big, because a software bug was updating 300,000 rows once every minute or so. The table being updated was a critical, main table that almost every query joins against, so I expect there were locking issues as well. Fixing that bug has alleviated the log/backup problem, but I'm unclear about whether we'll run into the same issue if/when transaction volume gets to those levels from legitimate purposes.

Thanks for all of the help here!

Cheers
-b

Go to Top of Page

derrickleggett
Pointy Haired Yak DBA

4184 Posts

Posted - 2004-07-11 : 17:20:39
I still think it's related to the disk subsystem and the array those log files are on. You said you looked at the logical counters. Did you look at he physical disk counters? That's what you would be interested in here.

MeanOldDBA
derrickleggett@hotmail.com

When life gives you a lemon, fire the DBA.
Go to Top of Page

aiken
Aged Yak Warrior

525 Posts

Posted - 2004-07-12 : 14:34:46
I believe that in this case the physical and logical counters would be the same -- the log array is RAID 1 between two disks, but given that the controller manages the RAID, as far as Windows is concerned, the logical and phyisical drives should be identical. That is, I don't think RAID controllers report the constituent physical drives to the OS.

In this case, wouldn't a more likely culprit be the RAID 5 array that the backups were going to? I personally don't think that was the case, since I would expect the write penalty to be offset by the greater number of spindles (5 on the RAID 5 versus 2 on the RAID 1).

Of course, the problem isn't manifesting at this point, so it's speculation until if/when it happens again.

Cheers
-b
Go to Top of Page
   

- Advertisement -