Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Removing duplicate records with criteria

Author  Topic 

cyph0id
Starting Member

3 Posts

Posted - 2015-03-09 : 19:58:07
Hi all,

I have regular work that requires me to extract a bunch of customer records from our database, and then remove duplicate address destinations (so we dont mail the same address more than once).

I can currently achieve this using a combination of my poor SQL skills and Excel, but it's really not working out for me, so I'm hoping someone can show me the SQL wizardry necessary to do it just in SQL.

Relevant fields:
Member.AddressBarcode (This is a unique barcode (Text representation of a base-3 number) based on the customer address. So if there's more than one record in the pulled records with the same barcode, we then look at Member.MemberTypeID to determine whether to include this record in the results or discard it as a duplicate. Note that AddressBarcode may be blank if the mailing address couldn't be validated, if it is blank we don't discard it since there is no easy way to detect duplicate addresses without the barcode)

Member.MemberTypeID (This is the type of member account. We have 3 types - Single, Joint Primary, Joint Secondary, represented in this field by the numbers 1/2/3. This is also the order of preference of who to mail. So if there is a Joint Primary and Joint Secondary with the same mailing barcode, we want to discard the Joint Secondary from the results, so that the Joint Primary is the record we include in the results of who to mail.)

Member.ID (Unique numeric ID for each customer. Kind of irrelevant here, but it's a key)

So some pseudo code for what I'm trying to achieve is:

(Member.MemberTypeID = 1)
OR (Member.MemberTypeID = 2 AND Member.AddressBarcode not in results of Member.MemberTypeID = 1)
OR (Member.MemberTypeID = 3 AND Member.AddressBarcode not in results of Member.MemberTypeID = 2 AND Member.AddressBarcode not in results of Member.MemberTypeID = 1)


Can anyone help me with this? I suspect it requires some sort of join...

Ifor
Aged Yak Warrior

700 Posts

Posted - 2015-03-10 : 07:16:21
Some consumable test data would be useful. In its absence, try something like the following:

WITH Ranks
AS
(
SELECT *
,RANK() OVER (PARTITION BY AddressBarcode ORDER BY MemberTypeID) AS rn
FROM Member
)
-- Check those with a rn greater than 1 are the ones you want to delete.
select * from Ranks;
-- If happy, remove the above select and uncomment the following.
--DELETE Ranks
--WHERE rn > 1;
Go to Top of Page

cyph0id
Starting Member

3 Posts

Posted - 2015-03-10 : 19:07:46
quote:
Originally posted by Ifor

Some consumable test data would be useful.



Here you go (tab separated):


ID AddressBarcode MemberTypeID
4196386 1301010220000221301220333113213312213 2
4050437 1301010220000221301220333113213312213 1
4171374 1301010220001221100121310022320122313 2
4143468 1301010220001221102122323233210023113 2
14116602 1301010220102112020201312230222023013 3
4116602 1301010220102112020201312230222023013 2
4160218 1301010220210201302010332210230303013 1
14136832 1301010220210201302002321221220110213 1
19092654 1301010220221201100021321322122113313 2
4130282 1301010222000001023001332230300100113 2
4088587 1301010222000001221030311031020033213 2
2425916 1301010222000011222002313033131313313 2
12425916 1301010222000011222002313033131313313 3
4050798 1301010222000100112100312312001120313 1
14186560 1301011000000002212012331020223020313 3
19124174 1301011000000022122000321121133111313 2
4077523 1301011000000030001000333103311133213 1
19164525 1301011000000111212200330302211320313 3
19164524 1301011000000111212200330302211320313 2



The above block of data has some duplicate barcodes to be filtered out. Thanks for your example, that's well beyond my SQL talents but this is how you learn!
Go to Top of Page

Ifor
Aged Yak Warrior

700 Posts

Posted - 2015-03-11 : 11:33:08
Comsumable test data should be consumable:

-- *** Test Data ***
CREATE TABLE #t
(
ID int NOT NULL
,AddressBarcode char(37) NOT NULL
,MemberTypeID tinyint NOT NULL
);
INSERT INTO #t
VALUES(4196386,'1301010220000221301220333113213312213', 2)
,(4050437,'1301010220000221301220333113213312213', 1)
,(4171374,'1301010220001221100121310022320122313', 2)
,(4143468,'1301010220001221102122323233210023113', 2)
,(14116602,'1301010220102112020201312230222023013', 3)
,(4116602,'1301010220102112020201312230222023013', 2)
,(4160218,'1301010220210201302010332210230303013', 1)
,(14136832,'1301010220210201302002321221220110213', 1)
,(19092654,'1301010220221201100021321322122113313', 2)
,(4130282,'1301010222000001023001332230300100113', 2)
,(4088587,'1301010222000001221030311031020033213', 2)
,(2425916,'1301010222000011222002313033131313313', 2)
,(12425916,'1301010222000011222002313033131313313', 3)
,(4050798,'1301010222000100112100312312001120313', 1)
,(14186560,'1301011000000002212012331020223020313', 3)
,(19124174,'1301011000000022122000321121133111313', 2)
,(4077523,'1301011000000030001000333103311133213', 1)
,(19164525,'1301011000000111212200330302211320313', 3)
,(19164524,'1301011000000111212200330302211320313', 2);
-- *** End Test Data ***


My original code seems to do what you want.

select * from #t;

WITH Ranks
AS
(
SELECT *
,RANK() OVER (PARTITION BY AddressBarcode ORDER BY MemberTypeID) AS rn
FROM #t
)
DELETE Ranks
WHERE rn > 1;

select * from #t;


If your real data still has duplicates which you wish to eliminate try something like:

WITH Ranks
AS
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY AddressBarcode ORDER BY MemberTypeID, ID) AS rn
FROM #t
)
DELETE Ranks
WHERE rn > 1;

Go to Top of Page

cyph0id
Starting Member

3 Posts

Posted - 2015-03-16 : 02:42:26
I'm struggling to get this to work with my existing code, like I said I am pretty talentless with SQL. So, assuming I already have code:

SELECT
vwPersons.ID,
vwPersons.AddressBarcode,
vwPersons.MemberTypeID

FROM
vwPersons

WHERE
Blah
AND Blah
Order by NewID()


How do I incorporate that into the WITH clause you supplied me with? I tried putting the WITH clause outside of my existing SELECT statement, but then you end up with more than one SELECT, and well, I got confused. Also, will the DELETE statement just affect the results the query returns or will it actually affect data in the database (given that I am not working with a temporarily created table as in your example)?

Go to Top of Page

huangchen
Starting Member

37 Posts

Posted - 2015-04-02 : 05:56:52
unspammed
Go to Top of Page
   

- Advertisement -