Author |
Topic |
meef
Posting Yak Master
113 Posts |
Posted - 2014-11-26 : 14:23:37
|
I am trying to pull a dataset here and keep getting duplicate records. I have a few joins so I thought maybe they were the problem but no matter what join I use I still get the same problem. I'm not sure what other information you may need so if I'm missing something let me know, maybe it's something simple that someone who knows more can pick out just by looking at the query.SELECT c.bol_id,c.bol_number,a.filename, a.scac, a.pro, CONVERT(varchar(20), CONVERT(date, CONVERT(varchar(8), a.delivered_date), 112),110)as date_delivered, b.status_code, convert(time(0),LEFT(b.status_time,2) +':'+ RIGHT(b.status_time,2)) as status_time, a.match_method, a.date_last_updated, a.message_dateFROM tbl_214_datatable AS a inner JOIN tbl_214_status AS b ON b.bol_id = a.bol_id inner join tbl_bol as c on b.bol_id=c.bol_id WHERE c.dba_id = '09acea7e-4cb5-4dba-a829-d1f8c411909b' AND (b.date_added between '20140901' and '20140930') AND (b.status_code = 'D1') ORDER BY date_delivered DESC |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-11-26 : 14:29:24
|
You are probably getting duplicates because of the 1:many relationship between the tables. Show us some sample data and expected output for us to help.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
meef
Posting Yak Master
113 Posts |
Posted - 2014-11-26 : 14:59:03
|
Sample data from above query:bol_id bol_number filename scac pro date_delivered status_code status_time4392625C-3385-48F2-A9ED-13CE4B350B6F 26676919 O:\FTP\Carrier214\PYLE\PROC\VIG-11106139.TXT PYLE 298826108 11/7/2014 D1 7:59:004392625C-3385-48F2-A9ED-13CE4B350B6F 26676919 O:\FTP\Carrier214\PYLE\PROC\VIG-11106139.TXT PYLE 298826108 11/7/2014 D1 9:46:004392625C-3385-48F2-A9ED-13CE4B350B6F 26676919 O:\FTP\Carrier214\PYLE\PROC\VIG-11106139.TXT PYLE 298826108 11/7/2014 D1 10:12:004392625C-3385-48F2-A9ED-13CE4B350B6F 26676919 O:\FTP\Carrier214\PYLE\PROC\VIG-11106139.TXT PYLE 298826108 11/7/2014 D1 10:36:004392625C-3385-48F2-A9ED-13CE4B350B6F 26676919 O:\FTP\Carrier214\PYLE\PROC\VIG-11106139.TXT PYLE 298826108 11/7/2014 D1 11:28:004392625C-3385-48F2-A9ED-13CE4B350B6F 26676919 O:\FTP\Carrier214\PYLE\PROC\VIG-11106139.TXT PYLE 298826108 11/7/2014 D1 11:38:004392625C-3385-48F2-A9ED-13CE4B350B6F 26676919 O:\FTP\Carrier214\PYLE\PROC\VIG-11106139.TXT PYLE 298826108 11/7/2014 D1 11:53:004392625C-3385-48F2-A9ED-13CE4B350B6F 26676919 O:\FTP\Carrier214\PYLE\PROC\VIG-11106139.TXT PYLE 298826108 11/7/2014 D1 12:44:004392625C-3385-48F2-A9ED-13CE4B350B6F 26676919 O:\FTP\Carrier214\PYLE\PROC\VIG-11106139.TXT PYLE 298826108 11/7/2014 D1 13:44:004392625C-3385-48F2-A9ED-13CE4B350B6F 26676919 O:\FTP\Carrier214\PYLE\PROC\VIG-11106139.TXT PYLE 298826108 11/7/2014 D1 16:38:00 Expected data would be to just have one single status. Each of these get a D1 status which means delivered, this is telling me that the same exact shipment has been delivered dozens of times at different times, and looking at the file where it's getting that data, there is indeed only one record it should be pulling. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-11-26 : 15:19:17
|
But there are multiple rows in tbl_214_datatable that match the join criteria. Run this to check:select * from tbl_214_datatable where bol_id = '4392625C-3385-48F2-A9ED-13CE4B350B6F'Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-11-26 : 15:19:47
|
Are you sure that only joining on bol_id is correct?Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
meef
Posting Yak Master
113 Posts |
Posted - 2014-12-01 : 08:33:16
|
There aren't multiple records though, that only returns one record if I try that query you posted.All of these tables have a unique BOL_ID field, that's what I usually join on. As far as I can tell it should be correct. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-12-01 : 11:34:06
|
Show us the output of these:select * from tbl_214_datatable where bol_id = '4392625C-3385-48F2-A9ED-13CE4B350B6F'select * from tbl_214_status where bol_id = '4392625C-3385-48F2-A9ED-13CE4B350B6F'select * from tbl_bol where bol_id = '4392625C-3385-48F2-A9ED-13CE4B350B6F'Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
meef
Posting Yak Master
113 Posts |
Posted - 2014-12-01 : 11:46:41
|
Funny you should say that, I was just replying to post results of that :)I get one record for everything but the tbl_214_status, there I get 889 records. Now I was expecting to have multiple records as each shipment has different statuses throughout its delivery, but not 889 spanned over a period of months. I think there's a data issue here with that table, would you agree? |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-12-01 : 12:17:06
|
I wouldn't be able to answer that for you as it would required business rules knowledge.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
meef
Posting Yak Master
113 Posts |
Posted - 2014-12-01 : 12:46:05
|
Ok, well I'm sure that's the issue. At most there should be maybe 10-15 status records for one BOL_ID, there's no way a shipment would be out there for 6 months before being delivered.Is there any sort of bandaid effect I can apply to ignore these duplicates? Here is the output from the tbl_214_status:tbl_214_status_id tbl_214_id bol_id status_date status_time status_code26C78EC5-D6C7-4654-87FF-896346191917 35EC1F00-A494-4100-AC36-FA42D7B9F0EE 4392625C-3385-48F2-A9ED-13CE4B350B6F 20140723 1138 D1A7DF47FD-D584-4561-B782-A7E4C2C9A9E5 35EC1F00-A494-4100-AC36-FA42D7B9F0EE 4392625C-3385-48F2-A9ED-13CE4B350B6F 20140723 1236 D1BA21EB80-24CA-4FB5-B7A0-E4642B043D8D 6AB9969C-7C94-4686-8949-FA07FE44D8E1 4392625C-3385-48F2-A9ED-13CE4B350B6F 20140715 1200 NULL45D8D171-357D-42A1-AF2A-940AE487E20E 6AB9969C-7C94-4686-8949-FA07FE44D8E1 4392625C-3385-48F2-A9ED-13CE4B350B6F 20140710 0 AG3FB5F19B-7A2B-4251-9428-9C27E982E264 6AB9969C-7C94-4686-8949-FA07FE44D8E1 4392625C-3385-48F2-A9ED-13CE4B350B6F 20140709 1550 AF892F7CF9-8AB4-4DF1-973B-3CA62ABF0013 6AB9969C-7C94-4686-8949-FA07FE44D8E1 4392625C-3385-48F2-A9ED-13CE4B350B6F 20140710 1150 D1F0DF59C1-8B7C-48FB-9A5A-0B15F936DA73 6AB9969C-7C94-4686-8949-FA07FE44D8E1 4392625C-3385-48F2-A9ED-13CE4B350B6F 20140710 1153 D1 I think I could use the tbl_214_id field to group things together and maybe only show one record per group. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-12-01 : 12:52:57
|
For the sample data you posted, which one do you want to show? You'll likely need to use ROW_NUMBER() function.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
meef
Posting Yak Master
113 Posts |
Posted - 2014-12-01 : 13:13:23
|
Looking at it again, maybe just each unique tbl_214_status_id? That's actually the primary key of that table. I think this is a problem of working with someone else who designed the tables and populates them based on a dozen different methods so when something goes wrong they don't even know how to correct the data which leaves me trying to clean it up on the fly. I don't even know how to tackle this anymore. |
|
|
|