Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Administration (2000)
 Copying large numbers of rows to another DB

Author  Topic 

Kristen
Test

22859 Posts

Posted - 2004-10-23 : 00:50:40
We have a LOG table which is recording about 1 million rows a day; this is expected to rise to 10 million/day in about 3 weeks time, so I'd like to get tooled up before the onslaught!

These are used for reporting current activity, and in a few reasonably rare situations to diagnose problems (auditing basis).

I want to move old data to a seperate database (or I think I do!)

Reasons are:

Reduce size of main database
Improve performance of Log table (in main database)

The LOG table, in main database, needs to be available 24/7

Plan is to have an SProc running in Quiet Hours to "top up" the archive database - i.e. copy across anything not already in the archive database - thus archive database is, at worst, 24 hours out of date.

Then to delete anything in the Main database which is, say, more than 7 days old (i.e. outside the period required for reporting)

Plan was to set Archive database Recovery model to Simple.

Should I be thinking about export/import using BCP etc. or just
INSERT INTO ArchiveDB.dbo.ArchiveTable
SELECT * FROM MainDB.dbo.LiveTable
WHERE LiveID > (SELECT MAX(ArchiveID) FROM ArchiveDB.dbo.ArchiveTable)

(Lets assume I would do this in reasonable sized batches)

What about the delete where odler than 7 days?

I've currently set something up which deletes N records, waits 5 seconds, then does another batch - where N is adjusted up/down according to how long the previous loop batch took. Is there a smarter way?

Thanks

Kristen

robvolk
Most Valuable Yak

15732 Posts

Posted - 2004-10-23 : 02:10:55
We just did this at my place of work. We have audit tables that are populated from triggers, but were stored in the same database. Growth was getting out of hand, backups and reindexes were taking too long, so we moved them to a separate database and cut a lot of obsolete data in the process. It is now much easier because the audit database can be reindexed and backed up on a completely different schedule from the main DB.

I would suggest that your log table be moved out of the main database as well, because it is more likely to fragment the data file than if it's in a separate DB. Reindexing won't necessarily fix the problem, because you'll have a huge table in the database that has to share space with the other tables. If you absolutely must keep it in the same DB then create a new filegroup for it's data and indexes and put the log table in it. I'd also recommend using triggers to perform the audit copy instead of using batches that run periodically. The overhead will be the same or less, and the batch sizes smaller, and will be less likely to block other processes that may be running. Plus you won't have a problem with being out of date.

As far as deleting old data, bcp is definitely the way to go. I also did this recently at work with another table. To give you an idea, bcp exported 55 million rows (2 GB) in 5 minutes, and imported them back in about 7 minutes. The log grew from 50 MB to 115 MB, all of it page allocations, no row operations at all. A delete operation would've grown the log to 4-5 GB or more, not only for the deletes but also the index update operations. The procedure would be: bcp out the data you want to KEEP, using a queryout setting. Truncate the table and drop the non-clustered indexes (you'll rebuild them later). bcp in using the TABLOCK and ORDER hints, and specify a 32K packet size with the -a parameter.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-10-23 : 02:36:05
The trigger is a great idea. Much easier to set up "custom" for each client (i.e. configure the database names etc.) than using bespoke SProcs / customer. And, as you say, overhead is spread evenly through the day.

The BCP out&back in is a bit of a problem. We don't have a timeslot where we could lock the LOG table for several minutes ... I wonder if I could do something that would allow:

1a) Set FLAG to indicate that House Keeping is starting
1b) All entries logged to LOG table will now also be logged to "spare" log table (by trigger)

2) BCP out the good data

3a) Lock LOG table
3b) Truncate LOG table
3c) Clear FLAG
3d) Unlock LOG table

(new entries will start appearing in LOG table, from the last Identity number)

4a) Copy rows back from Spare log table (SET INSERT_IDENTITY ON)
4b) Truncate Spare log table
4c) BCP back in the data from (1b)

Viable? or Barking?!

Kristen
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2004-10-23 : 12:01:59
If you set up a spare log table, you can write the trigger to insert to it if the main log table is empty, like so:

CREATE TRIGGER LogRecords ON myTable FOR INSERT AS
SET NOCOUNT ON
IF EXISTS(SELECT * FROM mainLog) INSERT INTO mainLog SELECT * FROM inserted
ELSE INSERT INTO spareLog SELECT * FROM inserted


That way, if something is being inserted while you'r truncating and reloading, it will be diverted to the spare log table. Once you're done reloading the data you can copy over anything from the spare log into the main and then truncate the spare.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-10-24 : 10:58:19
Good idea.

I could rename the "spare", and then copy back. That will store more data accumulating in the spare.

However, I though about the Trigger business a bit more. Is it adding to the "strain"? This table is pretty busy (actually we've got 4 log tables, that all need "archiving", but the others are tiny by comparison - but might as well use the same approach.

Whilst using a trigger to duplicate the data into the Archive table is attractive, in terms of spreading the load during the day, it also increases the activity whilst the DB is busy ... and I wonder if copying from live to Archive is better left to the "quiet hours"

Also ... if I can get away with NO trigger perhaps that is better also? (I could just CREATE the trigger when I want to do the housekeeping - I suppose I could even create an INSTEAD of trigger for that purpose - force any INSERTED to a SPARE table.

I've got a bit of a hassle with some UPDATES that occur to the log too ... most things UPDATE the original row to record the success/fail return code. They would allbe within the last few minutes though ... so I could pre-copy those to the SPARE table ... there will be precious few of them during the Quiet Hours.

Kristen
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2004-10-24 : 13:15:24
Normally I'd avoid using a trigger too, but then you're left with comparing two large tables to find the rows you want to audit and then inserting them in one or more large batches. This is going to suck if you're talking about millions of rows in each. The trigger doesn't have to compare, it just inserts whatever the operation affects. As far as inserting data, overall the insert activity will be about the same.

As far as the table being busy I would say test the trigger first, try hammering it if you can and see what happens. The overhead is usually a lot less than you think. What I'd be concerned about with the other method is the "quiet time" period. I've only seen these windows get smaller, and large operations grow longer. Unless you know you have a really good margin (75-100% extra time than what the operation will take) I think spreading the work out throughout the day will cause you less hassle.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-10-25 : 01:18:38
All good points, thanks Rob. The organisation will ship anywhere, but realistically they get 99.999% of their orders in the UK ... but they've just opened a shop in Hong Kong, so that gonna get their name known there and seriously disrupte the quiet period.

I'm sold! I'll toddle of and do some tests.

Kristen
Go to Top of Page
   

- Advertisement -