Efficiently Reuse Gaps in an Identity Column

By Peter Larsson on 9 February 2010 | Tags: Identity

This article will demonstrate an efficient way to reuse gaps in an identity column. Please note that this is something you normally shouldn't be bothered about in a well-designed database or application. However, there are circumstances where you are forced to do this.

Missing identity values occur for a number of reasons. The most common reasons are roll backed transactions, failed inserts and deletes. If your table has TINYINT as identity column, it is very easy to max out this value. Here is an example of how to do this.

-- Create sample table
CREATE TABLE    #Sample
                (
                    RowID TINYINT IDENTITY(0, 1),
                    j CHAR(1)
                )
GO

-- Insert original sample data
INSERT  #Sample
SELECT  'P'

SELECT  *
FROM    #Sample
GO

-- Insert some sample data and rollback them
BEGIN TRAN
INSERT  #Sample
SELECT  'Q'
ROLLBACK TRAN

BEGIN TRAN
INSERT  #Sample
SELECT  'L'
ROLLBACK TRAN

SELECT  *
FROM    #Sample

-- Insert additional sample data
INSERT  #Sample
SELECT  'P'

SELECT  *
FROM    #Sample
GO

-- Try insert 200 failing sample data
INSERT  #Sample
SELECT  'PP'
GO 200

-- Insert additional sample data
INSERT  #Sample
SELECT  'P'

SELECT  *
FROM    #Sample
GO

-- Insert 45 additional sample data and remove them
INSERT  #Sample
SELECT  'Q'
GO 45

DELETE
FROM    #Sample
WHERE   j = 'Q'

SELECT  *
FROM    #Sample
GO

-- Insert additional sample data
INSERT  #Sample
SELECT  'P'

SELECT  *
FROM    #Sample
GO

-- Clean up
DROP TABLE #Sample
GO

After this example, you can see that there are only four possible values left! We are now dangerously close to breaking the application. When the maximum value has been reached there is nothing you can do except changing the data type to a larger data type such as SMALLINT. But that might break schemas and you also may need to rewrite a number of stored procedures and rebuilding indexes.

In my opinion, this is a task that should be included in a DBA's job description: getting control over your identity values.

How Do We Fix This Scenario?

I first did some research to see which was the most common method to deal with this situation and not surprisingly the method of iterating all records from start to end was used! This method is not efficient. What if you have 1 million records and there is only 1 gap at position 15? Do you really want to shuffle all 1 million records to be sure to get rid of the gap? The answer is of course NO.

Having this in mind, I developed an algorithm which I call a folding algorithm. It defaults to reassign identity values at 1.

Diagram of identity reassignment

The algorithm looks for how many missing ID's there are and pick the same number of records from the end of list. For the one million table example above, since there is only one ID missing, I only have to shuffle one record from the end of table to the missing position. Efficient and clean because only one record has to be touched and not the other 999 986 records!

Since an IDENTITY column most often also is used as a primary key, I have included one auxiliary table which references the base table to fix. In a production environment, this can be several tables, but I only include one for demonstration purposes.

Begin with creating some sample data like this:

--  Create sample table
CREATE TABLE    #Sample
            (
                ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
                Data UNIQUEIDENTIFIER DEFAULT NEWID()
            )
GO

-- Populate sample data 
INSERT  #Sample
DEFAULT VALUES
GO 10
 
-- Remove some records to mimic real world scenario
DELETE
FROM    #Sample
WHERE   ID IN (1, 3, 4, 7, 10)
 
-- Check content of sample table
SELECT  *
FROM    #Sample

-- Create auxiliary table for foreign key issues
CREATE TABLE    #Aux
                (
                    AuxID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
                    SampleID INT NOT NULL REFERENCES #Sample(ID)
                )

INSERT  #Aux
SELECT  9 UNION ALL
SELECT  9 UNION ALL
SELECT  8

-- Check content of auxiliary table 
SELECT  *
FROM    #Aux

Now we have created a sample table with 10 records and then on purpose created some gaps by removing some records manually. We also created an auxiliary table referencing the base table. Since foreign key constraints are not enforced on temporary tables, this is for demonstration only.

The first step in the algorithm is to create the translation table. This table will hold all the missing ID's and which ID to shuffle. This piece of code is the heart of the algorithm. It will seek out all missing ID's, compare them to how many records there are in total, and "fold" them together. What the algorithm does, is to calculate the missing ID's and pick the appropriate number of records from end of table to shuffle and fill the gaps.

-- Set up ID translation with Peso's "folding" algorithm
CREATE TABLE    #Trans
                (
                    idNew INT NOT NULL,
                    idOld INT NOT NULL
                )

INSERT      #Trans
            (
                idNew,
                idOld
            )
SELECT      d.rn,
            w.ID
FROM        (
                SELECT      d.rn,
                            ROW_NUMBER() OVER (ORDER BY d.rn) AS recID
                FROM        (
                                SELECT  ROW_NUMBER() OVER (ORDER BY ID) AS rn
                                FROM    #Sample
                            ) AS d
                LEFT JOIN   #Sample AS s ON s.ID = d.rn
                WHERE       s.ID IS NULL
            ) AS d
INNER JOIN  (
                SELECT  ID,
                        ROW_NUMBER() OVER (ORDER BY ID DESC) AS recID
                 FROM   #Sample
            ) AS w ON w.recID = d.recID

With this in place we can now begin to create duplicate entries of the records to shuffle. This is risk free since new inserts are based on the current identity value so these forced inserted records will not collide.

Now when we have the #Trans control table, let's begin to shuffle the records and update auxiliary table(s).

-- Prepare source table for identity inserts
BEGIN TRAN
 
SET IDENTITY_INSERT #Sample ON
 
INSERT        #Sample
              (
                  ID,
                  data
              )
SELECT      t.idNew,
            s.data
FROM        #Trans AS t
INNER JOIN  #Sample AS s ON s.ID = t.idOld
 
IF @@ERROR <> 0
    BEGIN
        ROLLBACK TRAN
        RAISERROR('Insert old records with new ID failed.', 18, 1)
        RETURN
    END
 
SET IDENTITY_INSERT #Sample OFF
 
UPDATE      w
SET         w.SampleID = t.idNew
FROM        #Aux AS w
INNER JOIN  #Trans AS t ON t.idOld = w.SampleID
 
IF @@ERROR <> 0
    BEGIN
        ROLLBACK TRAN
        RAISERROR('Update #Aux table failed.', 18, 1)
        RETURN
    END
 
DELETE      s
FROM        #Sample AS s
INNER JOIN  #Trans AS t ON t.idOld = s.ID
 
IF @@ERROR <> 0
    BEGIN
        ROLLBACK TRAN
        RAISERROR('Delete old records failed.', 18, 1)
        RETURN
    END
 
COMMIT TRAN

And now we're set, right? Well, not quite right. Since the IDENTITY value for the base table is stored in the Meta data for the table, we need to reset the current IDENTITY value so that we do not produce another gap at the end of IDENTITY sequence. You can do that like this

-- Cleanup
DECLARE @ID INT
 
SELECT  @ID = MAX(ID)
FROM    #Sample

DBCC CHECKIDENT(#Sample, RESEED, @ID)
 
DROP TABLE  #Sample,
            #Aux,
            #Trans

Remember to put this piece of code in the transaction as well, for continuity.

Conclusion

I have tested this technique in a production system with several users online, and for a reasonable amount of gaps you can run this piece of code during normal operation. But remember the base table will be locked for a small amount of time and it will slow down other processes accessing the base table. The best practice is to run this algorithm off-hours.

Good luck!