Populating a column with unique values (pseudokey)

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

New to SQL Server Programming

Populating a column with unique values (pseudokey)

Author

Topic

MMMY
Starting Member

14 Posts

Posted - 2012-04-24 : 11:19:10

Hello,

I have a table with 600 MILLION records. One of the fields is "ID" and is supposed to be the primary key, but unfortunately is filled with many duplicate values. How can I repopulate this field with unique values while taking into account the table size?

One of my attempts at creating a new field failed miserably, presumably due to the size of the table, with the following error after 18 hours of running a simple ALTER TABLE query, which is why I want to get everyone's input before I run into some errors:

The statement has been terminated.
Msg 9002, Level 17, State 4, Line 1
The transaction log for database 'BLAHBLAH' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2012-04-24 : 11:30:21

Is the ID used for anything? Do you need to retain the IDs for anything or can you replace them all?

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2012-04-24 : 11:34:49

Without knowing the details, my suggestion would be to create a new table (which you have done). Use something you can batch the inserts into the new table(Either with a loop or using SSIS). With SSIS, you can set the batch size to something smaller than the entire table (50,000 or 100,000 or 1,000,000 or whatever makes sense). That will prevent the tran log from blowing up. Then you can drop or rename the existing table and rename the new table to take the current tables place.

MMMY
Starting Member

14 Posts

Posted - 2012-04-24 : 11:57:48

I do not need to retain the ID, my intent is to repopulate the field for all records. Thank you for the help. I'm still not an expert at SQL Server, so I will try to research "batch" procedures.

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2012-04-24 : 12:44:15

If you are using sql 2008 this may be a nice use of composable DML...

For example, a loop with a batchsize moving the data to the new table....


SET NOCOUNT ON

IF OBJECT_ID('tempdb..#sample') IS NOT NULL DROP TABLE #sample
CREATE TABLE #sample (
	[SampleID] INT
	, [SampleValue] VARCHAR(16)
	)
	
INSERT #sample ([SampleID], [SampleValue])
SELECT
	CAST(CAST(NEWID() AS VARBINARY(16)) AS TINYINT)
	, CAST(CAST(NEWID() AS VARBINARY(16)) AS VARCHAR(16))
FROM
	sys.objects so

-- Show the problem
SELECT
	[SampleID]
	, COUNT([SampleValue]) AS [SampleIDCount]
FROM
	#sample
GROUP BY
	[SampleID]
	
	
	
-- New Table
IF OBJECT_ID('tempdb..#result') IS NOT NULL DROP TABLE #result
CREATE TABLE #result (
	[SampleID] INT IDENTITY(1,1)
	, [SampleValue] VARCHAR(16)
	, PRIMARY KEY ([SampleID])
	)
	
DECLARE @batchSize SMALLINT = 500
DECLARE @rows INT = 1
DECLARE @totalRows INT = ( SELECT COUNT(*) FROM #sample )
DECLARE @rowsGone INT = 0

WHILE @rows > 0
BEGIN

	INSERT #result ([sampleValue])
	SELECT d.[SampleValue]
	FROM (
		DELETE TOP (@batchSize) FROM #sample
		OUTPUT DELETED.[SampleValue]
		)
		AS d
	
	SET @rows = @@ROWCOUNT
	SET @rowsGone += @rows
	RAISERROR('Processed %i out of %i...', 0, 1, @rowsGone, @totalRows) WITH NOWAIT
END

-- Show the result
SELECT
	[SampleID]
	, COUNT([SampleValue]) AS [SampleIDCount]
FROM
	#sample
GROUP BY
	[SampleID]

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-04-24 : 17:51:58

This is how I would do it.

1. Get a list of duplicate values.
2. With this list, assign unique values to the duplicates.

N 56°04'39.26"
E 12°55'05.63"

Transact Charlie
Master Smack Fu Yak Hacker

3451 Posts

Posted - 2012-04-24 : 18:11:54

quote:
Originally posted by SwePeso

This is how I would do it.

1. Get a list of duplicate values.
2. With this list, assign unique values to the duplicates.

N 56°04'39.26"
E 12°55'05.63"

You would still be in the position of having to throw a Primary key over that column though? (assuming the OP still wants to do that).

they haven't told us about the clustering on the table (I would bet there is none) So even if you uniqued that column, applying the PK would probably take quite a long time and may throw the same error as before? That's assuming it's clustered.......

Charlie
===============================================================
Msg 3903, Level 16, State 1, Line 1736
The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources