Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Distinct: Return all colums when using only 3 colu

Author  Topic 

Redser
Starting Member

9 Posts

Posted - 2012-01-29 : 06:28:44
Hi
I’ve been given a Customers table with no PK that contains lots of duplicate records. My goal is to move the data into a new table with a PK. Stage 1 is to identify all unique customers which I can do by using their Surname, Forename and DOB.

SELECT DISTINCT SURNAME, FORENAME, DOB
FROM Customers

My question is how can I bring back the additional fields for each record when using Distinct on just the 3 columns above . Or is there another way of doing this.

The additional fields for duplicate records don’t always contain the same data due to change of address or keying in errors. But that’s a separate issue all together.

Redser
Starting Member

9 Posts

Posted - 2012-01-29 : 10:01:06
Folks
I've done a bit more research and found a Solution that works perfectly for me. Sunitabeck came up with a solution on this forum on 3rd Aug 2011.
You don't have to use the Distinct function at all. Here is what works for me

SELECT
SURNAME,
FORENAMES,
DOB,
MAX(ADDRESS1),
MAX(ADDRESS2,
MAX(POSTCODE)

FROM dbo.Customers

GROUP BY
SURNAME,
FORENAMES,
DOB

ORDER BY
SURNAME

Hat's off to Sunitabeck.
Go to Top of Page

sunitabeck
Master Smack Fu Yak Hacker

5155 Posts

Posted - 2012-01-29 : 10:31:00
If the columns address1, address2, and postcode should be kept together, this may not be a good approach. For example, in the query below, it returns a random combination of street address, city and zip code, which may not be what you want.
CREATE TABLE #tmp(surname VARCHAR(32),forename VARCHAR(32), dob date, address1 VARCHAR(255), address2 VARCHAR(255),
postcode VARCHAR(255));

INSERT INTO #tmp VALUES
('Doe','Jane','20010101','234 Washington Ave', 'New York, NY', '10011'),
('Doe','Jane','20010101','123 Madison Ave', 'Washington, DC', '20011'),
('Doe','Jane','20010101','105 Lincoln Ave', 'San Jose, CA', '95103');

SELECT
surname, forename,dob, MAX(address1),MAX(address2), MAX(postcode)
FROM
#tmp
GROUP BY
surname, forename, dob;
An alternative might be to use the row_number() function as follows:
SELECT * FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY surname, forename, dob ORDER BY (SELECT NULL)) AS RN
FROM
#tmp
)s
WHERE RN = 1;
Go to Top of Page
   

- Advertisement -