Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 delete

Author  Topic 

musicman
Starting Member

1 Post

Posted - 2011-07-05 : 09:28:46
I am trying to delete duplicate records from a table that contains 2 million records. The problem I am facing is that the query is taking several days to complete. My query is below, it is basically comparing each record with another record & if all columns are the same data, it deletes the record. This technique has been successful for me on tables with smaller amounts of records, but I am not sure what I can do to speed up the query. My estimation is that there should only be about 200,000 unique records in this table which would indicate approximately 1.8 million records need to be deleted.

Anyone have any advise as to how I can accomplish this task quicker.

Thanks in advance.


delete from
FOXPRO_COMPLETE.CLAIMANT_NAME_T a
where
a.rowid >
any (select b.rowid
from
FOXPRO_COMPLETE.CLAIMANT_NAME_T b
where
(a.CLMNT_ID = b.CLMNT_ID or (a.CLMNT_ID is null and b.CLMNT_ID is null))
and
(a.CLMNT_FRST_NME = b.CLMNT_FRST_NME or (a.CLMNT_FRST_NME is null and b.CLMNT_FRST_NME is null))
and
(a.CLMNT_LAST_NME = b.CLMNT_LAST_NME or (a.CLMNT_LAST_NME is null and b.CLMNT_LAST_NME is null))
and
(a.CLMNT_MI = b.CLMNT_MI or (a.CLMNT_MI is null and b.CLMNT_MI is null))
)
;

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2011-07-05 : 09:36:38
Get the unique rows first
select CLMNT_ID, CLMNT_FRST_NME , CLMNT_LAST_NME , CLMNT_MI , rowid = min(rowid)
into #a
from FOXPRO_COMPLETE.CLAIMANT_NAME_T
group by CLMNT_ID, CLMNT_FRST_NME , CLMNT_LAST_NME , CLMNT_MI

now delete
delete FOXPRO_COMPLETE.CLAIMANT_NAME_T
where rowid not in (select rowid from #a)

You coukld do it in a single query but I prefer to see what's going to happen first.

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-07-05 : 09:41:19
... and with a better answer I think.

I would think something like this would be quicker... but it's still going to take some time to delete that many records.

I assumed that CLMNT_ID was a positive integer (so -1 in place of null)
And I assumed that an empty string is similar to a null for 'value' purposes...


delete b
from FOXPRO_COMPLETE.CLAIMANT_NAME_T a
Inner Join FOXPRO_COMPLETE.CLAIMANT_NAME_T b
On isnull(a.CLMNT_ID,-1) = isnull(b.CLMNT_ID,-1)
and isnull(a.CLMNT_FRST_NME,'') = isnull(b.CLMNT_FRST_NME,'')
and isnull(a.CLMNT_LAST_NME,'') = isnull(b.CLMNT_LAST_NME,'')
and isnull(a.CLMNT_MI,'') = isnull(b.CLMNT_MI,'')
and a.rowId < b.rowId


Corey

I Has Returned!!
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-07-05 : 09:42:21
Assuming you only want the minimum rowid for each entry, this is the fastest way in SQL Server to delete them:
SELECT MIN(rowid) rowid, CLMNT_ID, CLMNT_FRST_NME, CLMNT_LAST_NME, CLMNT_MI 
INTO #temp
FROM FOXPRO_COMPLETE.CLAIMANT_NAME_T
GROUP BY CLMNT_ID, CLMNT_FRST_NME, CLMNT_LAST_NME, CLMNT_MI -- select rows to keep into a temp table

TRUNCATE TABLE FOXPRO_COMPLETE.CLAIMANT_NAME_T -- truncate original table
INSERT FOXPRO_COMPLETE.CLAIMANT_NAME_T SELECT * FROM #temp -- insert from the temp table

DROP TABLE #temp
Go to Top of Page

nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2011-07-05 : 09:43:40
I missed the goup by in my original post.
Rob's suggestion only works if this is all the columns in the table.

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-07-05 : 09:57:09
If you use SQL Server 2005 and later, you can try this
DELETE	f
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY CLMNT_ID, CLMNT_FRST_NME, CLMNT_LAST_NME, CLMNT_MI ORDER BY RowID) AS RecID
FROM FOXPRO_COMPLETE.CLAIMANT_NAME_T
) AS f
WHERE RecID > 1



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page
   

- Advertisement -