Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Delete duplicates of all but 1 with highest value

Author  Topic 

pettittmj
Starting Member

3 Posts

Posted - 2012-07-11 : 13:12:48
Hello everyone. I was seeing if there was something I could get help on.

Say I have a table called Supplies

Collums will be Product_number, Obligated_amount, Water_amount, Air_amount


I would like to delete all duplicates that had the same Product_number and only keep the row where the
amount is the highest value between the three other fields (Obligated, water, and air)

Example:

Product_number / Obligated_amount / Water_amount / Air_amount

11515/ 0/ 0/ 15
11515/ 0/ 0/ 23
11515/ 4524/ 4524/ 0
11515/ 0/ 25/ 0
11515/ 0/ 15/ 15


result:

11515/ 4524/ 4524/ 0


So basically what ever row of the duplicate that has the highest value of the three colums would be saved and all other duplicates deleted.


Your time and help is greatly appreciated

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2012-07-11 : 14:47:36
Not sure I get it yet - which ever row has the biggest single amount or the total across all amounts? What about ties, does any amount column have precedence of the others?

which one should be kept?
11515/ 5/ 10/ 15
11515/ 15/ 10/ 5
11515/ 10/ 15/ 5
11515/ 0/ 0/ 30
11515/ 0/ 30/ 0
11515/ 30/ 0/ 0

Be One with the Optimizer
TG
Go to Top of Page

SM_SQL_newbie
Starting Member

1 Post

Posted - 2012-07-11 : 14:54:49
Hi Hoep this will help you,
1.Method 1:

DELETE from (
(select * from Supplies)
MINUS
(select product_number,max(obligated_amount),max(water_amount),max(air_amount)
from Supplies
group by product_number))

2.Method 2:

DELETE from
select product,obligated_amount,water_amount,air_amount,rank() over (partition by product_number order by obligated_amount+water_amount_air_amount desc ) rank
from supplies S
where rank >1


Thanks


Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-11 : 15:07:34
quote:
Originally posted by SM_SQL_newbie

Hi Hoep this will help you,
1.Method 1:

DELETE from (
(select * from Supplies)
MINUS
(select product_number,max(obligated_amount),max(water_amount),max(air_amount)
from Supplies
group by product_number))

2.Method 2:

DELETE from
select product,obligated_amount,water_amount,air_amount,rank() over (partition by product_number order by obligated_amount+water_amount_air_amount desc ) rank
from supplies S
where rank >1


Thanks





MINUS is ORACLE function

In SQL Server, you should be using EXCEPT instead

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

pettittmj
Starting Member

3 Posts

Posted - 2012-07-11 : 15:08:05
The product number is what I will be seeing to check the duplicates. All of the other three fields are completely indepentdent of each other. So it will be what ever entry has the highest single amount in any colum whether it be obligated, water, or air. Thanks for the help.
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2012-07-11 : 15:10:03
So no column precedence for ties? can be random?

Be One with the Optimizer
TG
Go to Top of Page

pettittmj
Starting Member

3 Posts

Posted - 2012-07-11 : 15:13:15
TG,

I belive you are asking me if any one colum is more important than the other or has a higher priority? If thats the case then no, just the highest single value in any of the three colums.
IE:

Product/Obligater/water/air
1151 /10/25/30
1151 /300/938/393
1151 /1000/0/0


the one i want selected would be 1000/0/0



In the real life application of this, It will be extremely rare for more than one entry to contain the same highest number because it is used as a large whole dollar number usually something like 357686
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2012-07-11 : 15:30:48
Ok, here is a query that will RETURN just one row for every product_number that should match your criteria. The problem with Deletes is that unless you have a unique constraint of some kind in the table there is NO WAY you can delete one row and leave another if these rows are duplicated across all columns.

you may need to use this SELECT to create a new table and then add a constraint on product_number. or at least an identity column so that you can distinguish one row from another.

;with supplies (Product_number, Obligated_amount, Water_amount, Air_amount)
as
(
select 11515, 0, 0, 15
union all select 11515, 0, 0, 23
union all select 11515, 4524, 4524, 0
union all select 11515, 0, 25, 0
union all select 11515, 0, 15, 15
)

select Product_number
,Obligated_amount
,Water_amount
,Air_amount
from (
select s.*
,row_number() over (order by s.obligated_amount, s.water_amount, s.air_amount) as tieBreaker
from (
select product_number
,max(amount) maxAmt
from supplies
unpivot (amount for cols in (Obligated_amount, Water_amount, Air_amount)) u
group by product_number
) d
join supplies s on s.product_number = d.product_number
where s.Obligated_amount = d.maxamt
or s.Water_amount = d.maxamt
or s.Air_amount = d.maxamt
) d
where tieBreaker = 1



Be One with the Optimizer
TG
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-11 : 15:49:04
shouldnt that be this?

;With CTE
AS
(
SELECT ProductID,
MAX(CASE WHEN Obligated > water THEN CASE WHEN Obligated > air THEN 1 ELSE 3 END ELSE 2 END) AS ID,
MAX(CASE WHEN Obligated > water THEN CASE WHEN Obligated > air THEN Obligated ELSE air END ELSE water END) AS Value
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY Product ORDER BY Obligater DESC) AS OSeq,
ROW_NUMBER() OVER (PARTITION BY Product ORDER BY water DESC) AS wseq,
ROW_NUMBER() OVER (PARTITION BY Product ORDER BY air DESC) AS aseq, *
FROM table
)t
WHERE OSeq = 1 OR wseq=1 OR seq=1
GROUP BY ProductID
)

DELETE t
FROM table t
WHERE NOT EXISTS (SELECT 1 FROM CTE WHERE ProductID = t.ProductID
AND ((ID = 1 AND Obligated = Value)
OR (ID = 2 AND water = Value)
OR (ID = 3 AND air = Value))
)




------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2012-07-11 : 16:02:06
>>shouldnt that be this?
could be. but still doesn't address the true dupe issue. when there are say 3 rows all with the same values across the row then you can either leave them all or delete them all. but there's no way to distinguish one from another (for the actual delete).

Be One with the Optimizer
TG
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-11 : 16:06:18
quote:
Originally posted by TG

>>shouldnt that be this?
could be. but still doesn't address the true dupe issue. when there are say 3 rows all with the same values across the row then you can either leave them all or delete them all. but there's no way to distinguish one from another (for the actual delete).

Be One with the Optimizer
TG


for that I dont think Op specified a clear rule either

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page
   

- Advertisement -