Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
MMMY
Starting Member
14 Posts |
Posted - 2012-04-24 : 11:19:10
|
| Hello,I have a table with 600 MILLION records. One of the fields is "ID" and is supposed to be the primary key, but unfortunately is filled with many duplicate values. How can I repopulate this field with unique values while taking into account the table size?One of my attempts at creating a new field failed miserably, presumably due to the size of the table, with the following error after 18 hours of running a simple ALTER TABLE query, which is why I want to get everyone's input before I run into some errors:The statement has been terminated.Msg 9002, Level 17, State 4, Line 1The transaction log for database 'BLAHBLAH' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2012-04-24 : 11:30:21
|
| Is the ID used for anything? Do you need to retain the IDs for anything or can you replace them all? |
 |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2012-04-24 : 11:34:49
|
| Without knowing the details, my suggestion would be to create a new table (which you have done). Use something you can batch the inserts into the new table(Either with a loop or using SSIS). With SSIS, you can set the batch size to something smaller than the entire table (50,000 or 100,000 or 1,000,000 or whatever makes sense). That will prevent the tran log from blowing up. Then you can drop or rename the existing table and rename the new table to take the current tables place. |
 |
|
|
MMMY
Starting Member
14 Posts |
Posted - 2012-04-24 : 11:57:48
|
| I do not need to retain the ID, my intent is to repopulate the field for all records. Thank you for the help. I'm still not an expert at SQL Server, so I will try to research "batch" procedures. |
 |
|
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2012-04-24 : 12:44:15
|
If you are using sql 2008 this may be a nice use of composable DML...For example, a loop with a batchsize moving the data to the new table....SET NOCOUNT ONIF OBJECT_ID('tempdb..#sample') IS NOT NULL DROP TABLE #sampleCREATE TABLE #sample ( [SampleID] INT , [SampleValue] VARCHAR(16) ) INSERT #sample ([SampleID], [SampleValue])SELECT CAST(CAST(NEWID() AS VARBINARY(16)) AS TINYINT) , CAST(CAST(NEWID() AS VARBINARY(16)) AS VARCHAR(16))FROM sys.objects so-- Show the problemSELECT [SampleID] , COUNT([SampleValue]) AS [SampleIDCount]FROM #sampleGROUP BY [SampleID] -- New TableIF OBJECT_ID('tempdb..#result') IS NOT NULL DROP TABLE #resultCREATE TABLE #result ( [SampleID] INT IDENTITY(1,1) , [SampleValue] VARCHAR(16) , PRIMARY KEY ([SampleID]) ) DECLARE @batchSize SMALLINT = 500DECLARE @rows INT = 1DECLARE @totalRows INT = ( SELECT COUNT(*) FROM #sample )DECLARE @rowsGone INT = 0WHILE @rows > 0BEGIN INSERT #result ([sampleValue]) SELECT d.[SampleValue] FROM ( DELETE TOP (@batchSize) FROM #sample OUTPUT DELETED.[SampleValue] ) AS d SET @rows = @@ROWCOUNT SET @rowsGone += @rows RAISERROR('Processed %i out of %i...', 0, 1, @rowsGone, @totalRows) WITH NOWAITEND-- Show the resultSELECT [SampleID] , COUNT([SampleValue]) AS [SampleIDCount]FROM #sampleGROUP BY [SampleID]Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2012-04-24 : 17:51:58
|
This is how I would do it.1. Get a list of duplicate values.2. With this list, assign unique values to the duplicates. N 56°04'39.26"E 12°55'05.63" |
 |
|
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2012-04-24 : 18:11:54
|
quote: Originally posted by SwePeso This is how I would do it.1. Get a list of duplicate values.2. With this list, assign unique values to the duplicates. N 56°04'39.26"E 12°55'05.63"
You would still be in the position of having to throw a Primary key over that column though? (assuming the OP still wants to do that).they haven't told us about the clustering on the table (I would bet there is none) So even if you uniqued that column, applying the PK would probably take quite a long time and may throw the same error as before? That's assuming it's clustered.......Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
|
|
|
|
|
|