Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Mattneedshelp
Starting Member
3 Posts |
Posted - 2012-05-14 : 09:41:05
|
| Hi,Here is my problem:We have a job at work that takes data from a staging table, places it all into a text file, and then transfers it to another system. If there are duplicate rows in the text file, the data transfer fails because duplicate data can't be placed into the system.But here is the problem, its not all of the row that's duplicated. So for example you have the tracking number, the action code, and the delivery time, but the action code can be whatever it likes, so if -Tracking no. Action code delivery time12345 C 230812345 B 2308If that occured, the job would fail. The action code is effectively ignored, but you can't have 2 instances of the same delivery time and tracking number.We currently run some SQL whereby it does a count of everything in the database, and brings back all of the duplicated rows. We then take the tracking numbers, and one by one have to do a 'find' in the .txt file, and remove each duplicate row. Bearing in mind most days there are about 80 duplicates, this whole process can take an hour.I'm thinking there MUST be a way of saying 'show me a distinct count of the tracking number + the delivery time in column a, and in column b show me the total number of rows for each tracking number + delivery time. Where there is more than 1 row, delete the appropriate number of rows to make it unique'.Is this far too complicated? Im hoping that if something like this was available, it could be done as an automated sql process in our batch schedule in the middle of the process, so there'd be a duplicate check before the data transfer is even attempted.I hope this makes sense, and I hope someone can help.Thank you! |
|
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2012-05-14 : 09:53:33
|
| This will get you duplicatesselect [Tracking no],[delivery time] from tablegroup by [Tracking no],[delivery time] having count(*)>1MadhivananFailing to plan is Planning to fail |
 |
|
|
Mattneedshelp
Starting Member
3 Posts |
Posted - 2012-05-14 : 10:23:47
|
| Yeah, I have no problem finding the duplicates, it's the actual removal of them.Say for example I ran the code below -SELECT COUNT(*) AS Total, TrackingNumber + DeliveryTime AS Details FROM TrackingtableGROUP BY TrackingNumber + DeliveryTime ORDER BY COUNT(*) DESCWhat code could I include after that would say where 'Total' > 1, delete the duplicate rows to get the count back to 1? |
 |
|
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2012-05-14 : 10:29:35
|
| Take a data backup for that table before running thisdelete t1 from table as t1 inner join(select [Tracking no],[delivery time] from tablegroup by [Tracking no],[delivery time] having count(*)>1) as t2on t1.[Tracking no]=t2.[Tracking no] and t1.[delivery time] =t2.[delivery time]MadhivananFailing to plan is Planning to fail |
 |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2012-05-14 : 10:30:50
|
quote: Originally posted by Mattneedshelp Yeah, I have no problem finding the duplicates, it's the actual removal of them.Say for example I ran the code below -SELECT COUNT(*) AS Total, TrackingNumber + DeliveryTime AS Details FROM TrackingtableGROUP BY TrackingNumber + DeliveryTime ORDER BY COUNT(*) DESCWhat code could I include after that would say where 'Total' > 1, delete the duplicate rows to get the count back to 1?
You can do it a few different ways. My favorite is using a CTE - see below. First run the code as it is; that will show you the rows that are going to be deleted. Then, uncomment the last line and comment the line above that and run to do the actual delete. As always, please test in a test environment before you pull the trigger:;WITH cte AS( SELECT *, ROW_NUMBER() OVER (PARTITION BY [Tracking no],[delivery time] ORDER BY Actioncode) AS RN FROM YourTable)SELECT * FROM cte WHERE RN > 1;--DELETE FROM cte WHERE RN > 1; |
 |
|
|
Mattneedshelp
Starting Member
3 Posts |
Posted - 2012-05-14 : 10:33:02
|
| Thanks Sunitabeck, are there alternative methods should that one not work? What is a CTE, i've never come across that before.Also, with the delete line, if the row is 2, and I run the delete line, is it not deleting both rows, or is it only deleting the one? |
 |
|
|
sunitabeck
Master Smack Fu Yak Hacker
5155 Posts |
Posted - 2012-05-14 : 11:21:09
|
quote: Originally posted by Mattneedshelp Thanks Sunitabeck, are there alternative methods should that one not work? What is a CTE, i've never come across that before.Also, with the delete line, if the row is 2, and I run the delete line, is it not deleting both rows, or is it only deleting the one?
There are some examples of other methods here (in the dowloadable zip file): http://www.sqlmag.com/article/tsql3/deleting-duplicate-recordsCTE is acronym for Common Table Expression. It is sort of like a sub query in this context; but CTE's are much more than just a subquery written on top. http://msdn.microsoft.com/en-US/library/ms190766(v=SQL.90).aspxTo answer your question - yes, it will delete only one row if there are two rows as in your example. Run the select query and you will see which rows are going to be delete. Remove the where clause and you will see all the rows. |
 |
|
|
madhivanan
Premature Yak Congratulator
22864 Posts |
Posted - 2012-05-16 : 04:25:30
|
| Have you tried my method too?MadhivananFailing to plan is Planning to fail |
 |
|
|
|
|
|
|
|