Anticipating Primary Key Violations

By Guest Authors on 23 November 2003 | Tags: INSERT


This article comes to us from Karen Gayda. She introduces a script she's written to catch primary key violations before they are inserted into a table.

Introduction

Years ago when I was just breaking into the programming field I worked with Paradox databases. While Paradox pales in comparison to a database engine like SQL Server, it did have one interesting feature that has not yet been duplicated by Microsoft. When performing an insert into a table, any key violation records would be shunted to a violation table while the good records would be inserted into the intended table. This feature allowed for graceful error handling and for tracking of bad data.

Recently I encountered a situation at one of my clients that required that I gracefully handle duplicate data being sent by customers. The goal was to allow the good data to be processed while capturing the bad data so that the customers could be notified of their data issues thus enabling them to correct transmission problems. Having a Paradox-style key violation table would be ideal to handle this problem. Since SQL Server does not include this functionality I decided I needed to write a generic routine to somewhat mimic Paradox.

Stored Procedure Script

This script creates a stored procedure that evaluates whether a select statement will cause Primary Key violations if data is inserted into a given target table. Key violation rows are saved to the designated violation table in the target database. (You can download the source code for this script and the required functions here.)

Calling this stored procedure prior to actually executing a select statement will cause the key violation keys to be saved to a table. Doing this allows for excluding those rows from being inserted if a where clause is added that checks for the existence of the key violation records.

SET QUOTED_IDENTIFIER OFF 
GO
SET ANSI_NULLS ON 
GO

IF EXISTS(SELECT * FROM dbo.sysobjects 
		WHERE id = object_id(N'[dbo].[sp_GetKeyViolations]') 
		AND OBJECTPROPERTY(id, N'IsProcedure') = 1
	)
	DROP PROCEDURE [dbo].[sp_GetKeyViolations]
GO



CREATE  PROCEDURE dbo.sp_GetKeyViolations
	@vcSelectStatement	varchar (8000),
	@sysTargetTable		sysname,
	@sysViolationTable	sysname,
	@sysTargetDatabase	sysname,
	@bitEmptyViolationTable bit	= 0
	
AS

SET NOCOUNT ON

DECLARE @vcSQL 		varchar(8000),
	@vcColumnList 	varchar(300),
	@sysColName	sysname
	
--Get the primary key columns in the target table
SET @vcSQL = 'DECLARE crCols CURSOR STATIC FOR SELECT DISTINCT
	 	name  FROM ' + @sysTargetDatabase + 
		'.dbo.PrimaryKeyColumns(''' + @sysTargetTable + ''')'
EXEC(@vcSQL)
OPEN crCols  

--For Each Column in PrimaryKey, concatenate to column list
SET @vcColumnList = ''
FETCH NEXT FROM crCols INTO @sysColName
WHILE(@@FETCH_STATUS = 0) 
BEGIN
	SET @vcColumnList = @vcColumnList + @sysColName + ','
	FETCH NEXT FROM crCols INTO @sysColName
END --Get next column
SET @vcColumnList = SUBSTRING(@vcColumnList,1,LEN(@vcColumnList) -1) --remove trailing comma
CLOSE crCols


--Get selected rows and save to a temporary table
IF EXISTS (select * from sysobjects where id = object_id('##tempSourceRecords') 
	and OBJECTPROPERTY(id, N'IsTable') = 1)
  DROP TABLE ##tempSourceRecords

SET @vcSQL = (SELECT Element FROM master.dbo.Split(@vcSelectStatement, 'FROM') 
			WHERE ElementID = 1) 
		+   ' INTO ##tempSourceRecords FROM ' 
		+ (SELECT Element FROM master.dbo.Split(@vcSelectStatement, 'FROM') 
			WHERE ElementID = 2)

PRINT(@vcSQL)
EXEC (@vcSQL)

	
--If key violation table does not already exist, create it
DECLARE	@nvcSQL			nvarchar(4000),
	@nvcParmDef		nvarchar(500),
	@bitExists		bit
	
        SET @nvcSQL = 'IF EXISTS (select * from '+ @sysTargetDatabase+ '.dbo.sysobjects 
				where name = ''' + @sysViolationTable + ''' AND
				xtype=''U'') ' +
			'SET @bitExistsOUT = 1 ' +
			'ELSE ' +
			'SET @bitExistsOUT = 0'
	
	SET @nvcParmDef = N'@bitExistsOUT bit OUTPUT'

	EXEC sp_executesql @nvcSQL, @nvcParmDef, @bitExistsOUT=@bitExists OUTPUT
        
	IF @bitExists = 0
	BEGIN
	--Create new table based upon supplied select list if needed
		SET @vcSQL = 'SELECT TOP 0 ' + @vcColumnList + 
		+ ' INTO ' + @sysTargetDatabase + '.dbo.'+ @sysViolationTable + 
		' FROM ##tempSourceRecords ' 
		
		EXEC (@vcSQL)
	END

--Empty violation table if caller indicates it should be cleared
IF @bitEmptyViolationTable = 1
BEGIN
	SET @vcSQL = 'DELETE FROM ' + @sysTargetDatabase + '.dbo.' +  @sysViolationTable
	PRINT @vcSQL
	EXEC (@vcSQL)
END

--Insert duplicate rows from target into key violation table
SET @vcSQL = 'INSERT INTO ' + @sysTargetDatabase + '.dbo.' + @sysViolationTable +
	' SELECT a.' + REPLACE(@vcColumnList, ',', ',a.') +
	' FROM ##tempSourceRecords a' +
	' INNER JOIN ' + @sysTargetDatabase + '.dbo.' + @sysTargetTable + ' b ' +
	' ON '

OPEN crCols
FETCH NEXT FROM crCols 
INTO @sysColName --get inner join columns to add to SQL insert statement

WHILE(@@FETCH_STATUS = 0) 
BEGIN
	SET @vcSQL  = @vcSQL + 'a.' + @sysColName + '=b.' + @sysColName + ' AND '
	FETCH NEXT FROM crCols INTO @sysColName
END --Get next column
SET @vcSQL = SUBSTRING(@vcSQL,1,LEN(@vcSQL) -4) --remove trailing AND

--Add rows that are duplicate in the source select that may cause violation
SET @vcSQL = @vcSQL + ' UNION SELECT ' + @vcColumnList + ' FROM ##TempSourceRecords ' +
	'GROUP BY ' + @vcColumnList + ' HAVING COUNT(*) > 1'

CLOSE crCOLS
DEALLOCATE crCols

PRINT @vcSQL
EXEC (@vcSQL)

DROP TABLE ##tempSourceRecords

RETURN (0)


GO

Listing 1 – sp_GetKeyViolations Code

Parameters

@vcSelectStatement. SQL statement that selects primary key columns for records being tested for key violations. IMPORTANT: Only key columns should be included in select. If source table columns are named differently than in the target then they must be aliased with the same name as the target. These restrictions were necessary to make the procedure generic.

@sysTargetTable. Table where data would be inserted if record is not a duplicate.

@sysViolationTable. Table where key violation rows will be saved

@sysTargetDatabase. Database where target table is located and where key violation table exists or will be created

@bitEmptyViolationTable. Optional, indicates whether pre-existing key violation rows should be deleted from violation table before adding new records. If omitted, records will be preserved.

Usage

This stored procedure should be called prior to actually executing a SELECT statement. For example, run the following statement from the master database (once the procedure and supporting functions have been created):

exec sp_GetKeyViolations 'SELECT TOP 10 CustomerID as CustomerID 
	FROM Northwind.dbo.Customers WHERE ContactTitle LIKE ''Sales%''',  
			'Customers', --target table
			'DupCustomers', -- table to store violation rows
			'NORTHWIND', -- target database
			0 -- do not clear violation table prior to insert

Now the duplicate records can be excluded by adding a check to the key violation table as follows:

SELECT TOP 10 CustomerID as CustomerID FROM Northwind.dbo.Customers C 
WHERE ContactTitle LIKE 'Sales%' AND 
NOT EXISTS( SELECT 1 FROM Northwind.dbo.DupCustomers D
		WHERE C.CustomerID = D.CustomerID)

Dependencies

Sp_GetKeyViolations calls 2 user-defined functions, dbo.Split and dbo.PrimaryKeyColumns. dbo.Split should be installed in the master database. dbo.PrimaryKeyColumns should be installed in any database that will be used as a target database. If this sp will be used extensively then I recommend adding dbo.PrimaryKeyColumns to the model database to reduce administration. The code for these functions is included in the source code

I created sp_GetKeyViolation in the master database. It is written so that it can be called from any DB for which a user has SP execute permissions.

Conclusion

Since duplicate data is a common problem, this procedure addresses the need to identify bad data without hampering the insert process in a very generic fashion.


Related Articles

Fast CSV Import in PowerShell to SQL Server (18 March 2014)

How to Insert Values into an Identity Column in SQL Server (6 August 2007)

Using Views to Enforce Business Rules (9 April 2007)

Using the OUTPUT Clause to Capture Identity Values on Multi-Row Inserts (14 August 2006)

Using EXISTS (12 October 2003)

Using SELECT to INSERT records (15 August 2000)

Other Recent Forum Posts

Query performance Call Center data (22h)

Looking for on Premises tool to read data from SQL logs and populate data warehouse (1d)

Possible SQL 2014 to 2016 Issue - Some Application Functions Slow/Failing (1d)

Working with multiple WHERE statements (1d)

Create a new field value that shows a difference in value from 2 fields (3d)

Hierarchy wise Sales Targets (3d)

Get the MaxDate in results (5d)

Switch Statement within a SSRS SQL Expression (6d)

- Advertisement -