Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Finding duplicates during a rolling time interval

Author  Topic 

Humate
Posting Yak Master

101 Posts

Posted - 2010-08-04 : 09:45:31
Hi All,

I would like to show unique records during any 5 minute period, where the Calltype and AgentID is the same. (Duplicates are indicated by a 1).

The desired result is like below, where the rolling 5 minute time interval is considered. I'm struggling to think how I can find the necessary method to identify the duplicates.

CallType---AgentID---Time---Round5Min---Duplicate---
10000-----1215-----09:01:00-----09:00:00-----0-----
12000-----1215-----10:01:00-----10:00:00-----0-----
12000-----1215-----10:02:00-----10:00:00-----1-----
12000-----1215-----10:02:31-----10:05:00-----1-----
12000-----1215-----10:02:38-----10:05:00-----1-----
12000-----1215-----10:05:59-----10:05:00-----1-----

If I try to make a duplicate key from the Calltype, AgentId and time numbers, rounding to 5 minutes, I get the wrong behaviour - as seen below when the records round to 10:05:00.

CallType---AgentID---Time---Round5Min---Duplicate---
10000-----1215-----09:01:00-----09:00:00-----0-----
12000-----1215-----10:01:00-----10:00:00-----0-----
12000-----1215-----10:02:00-----10:00:00-----1-----
12000-----1215-----10:02:31-----10:05:00-----0-----
12000-----1215-----10:02:38-----10:05:00-----1-----
12000-----1215-----10:05:59-----10:05:00-----1-----

Same problem if I round up 5 the nearest mintues as seen below with the last row of data.

CallType---AgentID---Time---RoundUP5Min---Duplicate---
10000-----1215-----09:01:00-----09:00:00-----0-----
12000-----1215-----10:01:00-----10:00:00-----0-----
12000-----1215-----10:02:00-----10:00:00-----1-----
12000-----1215-----10:02:31-----10:05:00-----1-----
12000-----1215-----10:02:38-----10:05:00-----1-----
12000-----1215-----10:05:59-----10:10:00-----0-----


Any help on this appreciated.
Thanks

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-04 : 10:01:35
[code]-- Prepare sample data
DECLARE @Sample TABLE
(
CallType TINYINT,
AgentID SMALLINT,
[Time] TIME(0)
)

INSERT @Sample
SELECT 10, 1215, '09:01:00' UNION ALL
SELECT 12, 1215, '10:01:00' UNION ALL
SELECT 12, 1215, '10:02:00' UNION ALL
SELECT 12, 1215, '10:02:31' UNION ALL
SELECT 12, 1215, '10:02:38' UNION ALL
SELECT 12, 1215, '10:05:59'

-- Display the wanted result
SELECT s.CallType,
s.AgentID,
s.[Time],
CASE
WHEN DATEDIFF(SECOND, f.[Time], s.[Time]) BETWEEN 1 AND 299 THEN 1
ELSE 0
END
FROM @Sample AS s
OUTER APPLY (
SELECT TOP(1) w.[Time]
FROM @Sample AS w
WHERE w.AgentID = s.AgentID
AND w.CallType = s.CallType
AND w.[Time] < s.[Time]
ORDER BY w.[Time]
) AS f([Time])[/code]


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-04 : 10:10:55
If you stop changing your original query, you actually might have a look at my suggestion.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

Humate
Posting Yak Master

101 Posts

Posted - 2010-08-04 : 12:10:58
thanks Peter, this looks interesting. Although I'm not finding the duplicates I need, I think because I specified time only (I actually have a datetime), the query returns the first unique record, not subsequent records in a later time period?? I've altered code below for an example I have tested.

I want to use this query over a period of several months to find duplicates in any one period of 5 minutes. Record 50010 is what I want to identify as a duplicate again, does that make sense?

DECLARE @Sample TABLE
( intLeadID INT,
intCampaignID INT,
intUserID INT,
[dtmCreated] DATETIME
)
INSERT @Sample

SELECT 47851, 5552, 93, '2010-07-08 14:33:37.013' UNION ALL
SELECT 47852, 5552, 94, '2010-07-08 14:33:50.513' UNION ALL
SELECT 47854, 5552, 55, '2010-07-08 14:35:20.453' UNION ALL
SELECT 47855, 5552, 55, '2010-07-08 14:35:20.513' UNION ALL
SELECT 47856, 5552, 55, '2010-07-08 14:36:13.437' UNION ALL
SELECT 47857, 5552, 93, '2010-07-08 14:36:23.843' UNION ALL
SELECT 47858, 5552, 93, '2010-07-08 14:36:41.280' UNION ALL
SELECT 50009, 5552, 93, '2010-07-09 14:51:22.103' UNION ALL
SELECT 50010, 5552, 93, '2010-07-09 14:54:28.513'

SELECT s.intLeadID,
s.intCampaignID,
s.intUserID,
s.dtmCreated,
CASE
WHEN DATEDIFF(SECOND, f.dtmCreated, s.dtmCreated) BETWEEN 0 AND 299 THEN 1
ELSE 0
END
FROM @Sample AS s
OUTER APPLY (
SELECT TOP(1) w.dtmCreated
FROM @Sample AS w
WHERE w.intUserID = s.intUserID
AND w.intCampaignID = s.intCampaignID
AND w.dtmCreated < s.dtmCreated
ORDER BY w.dtmCreated
) AS f(dtmCreated)


Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-04 : 14:20:46
My bad. The inner ORDER BY statement should DESCending.
DECLARE	@Sample TABLE
(
intLeadID INT,
intCampaignID INT,
intUserID INT,
dtmCreated DATETIME
)

INSERT @Sample
SELECT 47851, 5552, 93, '2010-07-08 14:33:37.013' UNION ALL
SELECT 47852, 5552, 94, '2010-07-08 14:33:50.513' UNION ALL
SELECT 47854, 5552, 55, '2010-07-08 14:35:20.453' UNION ALL
SELECT 47855, 5552, 55, '2010-07-08 14:35:20.513' UNION ALL
SELECT 47856, 5552, 55, '2010-07-08 14:36:13.437' UNION ALL
SELECT 47857, 5552, 93, '2010-07-08 14:36:23.843' UNION ALL
SELECT 47858, 5552, 93, '2010-07-08 14:36:41.280' UNION ALL
SELECT 50009, 5552, 93, '2010-07-09 14:51:22.103' UNION ALL
SELECT 50010, 5552, 93, '2010-07-09 14:54:28.513'

SELECT s.intLeadID,
s.intCampaignID,
s.intUserID,
s.dtmCreated,
CASE
WHEN DATEDIFF(SECOND, f.dtmCreated, s.dtmCreated) BETWEEN 0 AND 299 THEN 1
ELSE 0
END
FROM @Sample AS s
OUTER APPLY (
SELECT TOP(1) w.dtmCreated
FROM @Sample AS w
WHERE w.intUserID = s.intUserID
AND w.intCampaignID = s.intCampaignID
AND w.dtmCreated < s.dtmCreated
ORDER BY w.dtmCreated DESC
) AS f(dtmCreated)



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

Humate
Posting Yak Master

101 Posts

Posted - 2010-08-04 : 14:44:59
Nearly there! It's found a lot more of the duplicates I was expecting on my full table, however I can see some not found, eg for agent 45 I only have one duplicate (green). I can't see why the red records are not identified?

48172 5552 45 09/07/2010 12:36:33 0
48173 5552 45 09/07/2010 12:37:00 0
48176 5552 45 09/07/2010 12:38:24 0
48177 5552 45 09/07/2010 12:38:37 0


51459 11530 45 19/07/2010 12:51:07 0
51464 5552 45 19/07/2010 12:55:09 0
51465 11530 45 19/07/2010 12:55:14 1

51790 5552 45 20/07/2010 07:49:18 0
51791 5552 45 20/07/2010 07:49:50 0
51792 5552 45 20/07/2010 07:50:41 0

51827 2380 45 20/07/2010 08:48:20 0
51846 2380 45 20/07/2010 09:20:01 0
51852 5552 45 20/07/2010 09:31:25 0
51861 11323 45 20/07/2010 09:35:45 0
51873 2380 45 20/07/2010 09:43:03 0
51882 10976 45 20/07/2010 09:54:38 0
51886 8260 45 20/07/2010 09:59:19 0
51909 5552 45 20/07/2010 10:15:36 0
51910 11419 45 20/07/2010 10:15:50 0
52025 5552 45 20/07/2010 12:32:21 0
52026 5552 45 20/07/2010 12:32:29 0

56966 5552 45 30/07/2010 08:19:48 0
56968 5552 45 30/07/2010 08:20:21 0
56969 5552 45 30/07/2010 08:20:57 0
56970 5552 45 30/07/2010 08:21:39 0
56971 5552 45 30/07/2010 08:22:19 0
56972 5552 45 30/07/2010 08:23:03 0
56973 5552 45 30/07/2010 08:23:40 0

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-04 : 15:44:32
Look ok to me...
DECLARE	@Sample TABLE
(
SomeUnknownID INT,
CallType SMALLINT,
AgentID SMALLINT,
[DateTime] DATETIME,
SomeUnknownFlag TINYINT
)

SET DATEFORMAT DMY

INSERT @Sample
SELECT 48172, 5552, 45, '09/07/2010 12:36:33', 0 UNION ALL
SELECT 48173, 5552, 45, '09/07/2010 12:37:00', 0 UNION ALL
SELECT 48176, 5552, 45, '09/07/2010 12:38:24', 0 UNION ALL
SELECT 48177, 5552, 45, '09/07/2010 12:38:37', 0 UNION ALL
SELECT 51459, 11530, 45, '19/07/2010 12:51:07', 0 UNION ALL
SELECT 51464, 5552, 45, '19/07/2010 12:55:09', 0 UNION ALL
SELECT 51465, 11530, 45, '19/07/2010 12:55:14', 1 UNION ALL
SELECT 51790, 5552, 45, '20/07/2010 07:49:18', 0 UNION ALL
SELECT 51791, 5552, 45, '20/07/2010 07:49:50', 0 UNION ALL
SELECT 51792, 5552, 45, '20/07/2010 07:50:41', 0 UNION ALL
SELECT 51827, 2380, 45, '20/07/2010 08:48:20', 0 UNION ALL
SELECT 51846, 2380, 45, '20/07/2010 09:20:01', 0 UNION ALL
SELECT 51852, 5552, 45, '20/07/2010 09:31:25', 0 UNION ALL
SELECT 51861, 11323, 45, '20/07/2010 09:35:45', 0 UNION ALL
SELECT 51873, 2380, 45, '20/07/2010 09:43:03', 0 UNION ALL
SELECT 51882, 10976, 45, '20/07/2010 09:54:38', 0 UNION ALL
SELECT 51886, 8260, 45, '20/07/2010 09:59:19', 0 UNION ALL
SELECT 51909, 5552, 45, '20/07/2010 10:15:36', 0 UNION ALL
SELECT 51910, 11419, 45, '20/07/2010 10:15:50', 0 UNION ALL
SELECT 52025, 5552, 45, '20/07/2010 12:32:21', 0 UNION ALL
SELECT 52026, 5552, 45, '20/07/2010 12:32:29', 0 UNION ALL
SELECT 56966, 5552, 45, '30/07/2010 08:19:48', 0 UNION ALL
SELECT 56968, 5552, 45, '30/07/2010 08:20:21', 0 UNION ALL
SELECT 56969, 5552, 45, '30/07/2010 08:20:57', 0 UNION ALL
SELECT 56970, 5552, 45, '30/07/2010 08:21:39', 0 UNION ALL
SELECT 56971, 5552, 45, '30/07/2010 08:22:19', 0 UNION ALL
SELECT 56972, 5552, 45, '30/07/2010 08:23:03', 0 UNION ALL
SELECT 56973, 5552, 45, '30/07/2010 08:23:40', 0

SELECT s.SomeUnknownID,
s.CallType,
s.AgentID,
s.[DateTime],
s.SomeUnknownFlag,
CASE
WHEN DATEDIFF(SECOND, f.[DateTime], s.[DateTime]) BETWEEN 1 AND 299 THEN 1
ELSE 0
END AS FlagByPeso
FROM @Sample AS s
OUTER APPLY (
SELECT TOP(1) w.[DateTime]
FROM @Sample AS w
WHERE w.CallType = s.CallType
AND w.AgentID = s.AgentID
AND w.[DateTime] < s.[DateTime]
ORDER BY w.[DateTime] DESC
) AS f([DateTime])
ORDER BY s.CallType,
s.AgentID,
s.[DateTime]



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

Humate
Posting Yak Master

101 Posts

Posted - 2010-08-04 : 16:48:31
Original one is working OK now.. hmmm. Maybe too a long a day for me!
Thanks again Peter, this helps a lot.
Go to Top of Page

Humate
Posting Yak Master

101 Posts

Posted - 2010-08-13 : 15:09:03
OK - so I'm hoping to use this same theory in another problem, looking for duplicate telephone number records in a rolling 7 day period, and think I found my issue.

I tried to implement the solution, but the Seconds difference calculates from the previous duplicate record, not since the first unique record in the series/period.

In the below example I would expect CallID 3 to be Unique, and have a second difference of 604801 from CallID 1, but it is only 600301 having calculated difference from CallID 2.


DECLARE @Sample TABLE
(
CallID INT,
TelephoneNo INT,
[StartTime] DATETIME
)

INSERT @Sample

SELECT 1, 901732, '2010-01-01 09:00:00.000' UNION ALL
SELECT 2, 901732, '2010-01-01 09:30:00.000' UNION ALL
SELECT 3, 901732, '2010-01-08 09:00:01.000' UNION ALL
SELECT 4, 901732, '2010-01-19 09:00:00.000' UNION ALL
SELECT 5, 123456, '2010-01-09 09:00:00.000' UNION ALL
SELECT 6, 123456, '2010-02-09 09:00:00.000'

SELECT s.CallID,
s.TelephoneNo,
s.StartTime,
CASE
WHEN DATEDIFF(SECOND, f.StartTime, s.StartTime) BETWEEN 0 AND 604800 THEN 0
ELSE 1
END AS UniqueFlag,
DATEDIFF(SECOND, f.StartTime, s.StartTime) AS SecondDifference
FROM @Sample AS s
OUTER APPLY (
SELECT TOP(1) w.StartTime
FROM @Sample AS w
WHERE w.TelephoneNo = s.TelephoneNo
AND w.StartTime < s.StartTime
ORDER BY w.StartTime DESC
) AS f(StartTime)
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-13 : 15:33:33
If you change
ORDER BY	w.StartTime DESC
to
ORDER BY	w.StartTime ASC
what happens then?




N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

Humate
Posting Yak Master

101 Posts

Posted - 2010-08-13 : 15:57:56
It doesn't work in a different way. I effectively want each record to be checked if the telephone number appears in last 7 days from it's Starttime. Not sure this is the way to do it.

Another record added below to illustrate. CallID 7 should now be a duplicate because of CallID 6 appearing in last 7 days from it - it's not because the difference is GREATER than 604800 due to calculating from CallID 5.

DECLARE @Sample TABLE
(
CallID INT,
TelephoneNo INT,
[StartTime] DATETIME
)

INSERT @Sample

SELECT 1, 901732, '2010-01-01 09:00:00.000' UNION ALL
SELECT 2, 901732, '2010-01-01 09:30:00.000' UNION ALL
SELECT 3, 901732, '2010-01-08 09:00:01.000' UNION ALL
SELECT 4, 901732, '2010-01-19 09:00:00.000' UNION ALL
SELECT 5, 123456, '2010-01-09 09:00:00.000' UNION ALL
SELECT 6, 123456, '2010-02-09 09:00:00.000' UNION ALL
SELECT 7, 123456, '2010-02-16 08:59:00.000'

SELECT s.CallID,
s.TelephoneNo,
s.StartTime,
CASE
WHEN DATEDIFF(SECOND, f.StartTime, s.StartTime) BETWEEN 0 AND 604800 THEN 0
ELSE 1
END AS UniqueFlag,
DATEDIFF(SECOND, f.StartTime, s.StartTime) AS SecondDifference
FROM @Sample AS s
OUTER APPLY (
SELECT TOP(1) w.StartTime
FROM @Sample AS w
WHERE w.TelephoneNo = s.TelephoneNo
AND w.StartTime < s.StartTime
ORDER BY w.StartTime ASC
) AS f(StartTime)
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-13 : 17:00:49
[code]DECLARE @Sample TABLE
(
CallID INT,
TelephoneNo INT,
[StartTime] DATETIME
)

INSERT @Sample

SELECT 1, 901732, '2010-01-01 09:00:00.000' UNION ALL
SELECT 2, 901732, '2010-01-01 09:30:00.000' UNION ALL
SELECT 3, 901732, '2010-01-08 09:00:01.000' UNION ALL
SELECT 4, 901732, '2010-01-19 09:00:00.000' UNION ALL
SELECT 5, 123456, '2010-01-09 09:00:00.000' UNION ALL
SELECT 6, 123456, '2010-02-09 09:00:00.000' UNION ALL
SELECT 7, 123456, '2010-02-16 08:59:00.000'

SELECT f.CallID,
f.TelephoneNo,
f.StartTime
FROM (
SELECT CallID,
TelephoneNo,
StartTime,
ROW_NUMBER() OVER (PARTITION BY TelephoneNo ORDER BY StartTime) AS RecID
FROM @Sample
) AS s
CROSS APPLY (
SELECT TOP(1) w.CallID,
w.TelephoneNo,
w.StartTime
FROM @Sample AS w
WHERE w.TelephoneNo = s.TelephoneNo
AND w.StartTime >= DATEADD(SECOND, 604800, s.StartTime)
ORDER BY w.StartTime
) AS f(CallID, TelephoneNo, StartTime)
WHERE s.RecID = 1
ORDER BY f.CallID[/code]


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

Humate
Posting Yak Master

101 Posts

Posted - 2010-08-13 : 17:40:06
If it works in date order, I would expect the following results - last column idicates the unique calls in 7 day period with a value of 1.

DECLARE @Sample TABLE
(
CallID INT,
TelephoneNo INT,
[StartTime] DATETIME,
IsAUniqueCall INT
)

INSERT @Sample

SELECT 1, 901732, '2010-01-01 09:00:00.000', 1 UNION ALL
SELECT 2, 901732, '2010-01-01 09:30:00.000', 0 UNION ALL
SELECT 3, 901732, '2010-01-08 09:00:01.000', 0 UNION ALL
SELECT 4, 901732, '2010-01-19 09:00:00.000', 1 UNION ALL
SELECT 5, 123456, '2010-01-09 09:00:00.000', 1 UNION ALL
SELECT 6, 123456, '2010-02-09 09:00:00.000', 1 UNION ALL
SELECT 7, 123456, '2010-02-16 08:59:00.000', 0

SELECT * FROM @Sample
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-08-13 : 17:56:02
[code]DECLARE @Sample TABLE
(
CallID INT,
TelephoneNo INT,
[StartTime] DATETIME
)

INSERT @Sample

SELECT 1, 901732, '2010-01-01 09:00:00.000' UNION ALL
SELECT 2, 901732, '2010-01-01 09:30:00.000' UNION ALL
SELECT 3, 901732, '2010-01-08 09:00:01.000' UNION ALL
SELECT 4, 901732, '2010-01-19 09:00:00.000' UNION ALL
SELECT 5, 123456, '2010-01-09 09:00:00.000' UNION ALL
SELECT 6, 123456, '2010-02-09 09:00:00.000' UNION ALL
SELECT 7, 123456, '2010-02-16 08:59:00.000'

;WITH cteSource(CallID, TelephoneNo, StartTime, RecID)
AS (
SELECT CallID,
TelephoneNo,
StartTime,
ROW_NUMBER() OVER (PARTITION BY TelephoneNo ORDER BY StartTime) AS RecID
FROM @Sample
)
SELECT f.CallID,
f.TelephoneNo,
f.StartTime
FROM cteSource AS s
CROSS APPLY (
SELECT TOP(1) w.CallID,
w.TelephoneNo,
w.StartTime
FROM @Sample AS w
WHERE w.TelephoneNo = s.TelephoneNo
AND w.StartTime >= DATEADD(SECOND, 604800, s.StartTime)
ORDER BY w.StartTime
) AS f(CallID, TelephoneNo, StartTime)
WHERE s.RecID = 1

UNION ALL

SELECT CallID,
TelephoneNo,
StartTime
FROM cteSource
WHERE RecID = 1

ORDER BY CallID[/code]


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

Gilchrist
Starting Member

1 Post

Posted - 2010-11-10 : 13:41:38
quote:
Originally posted by Humate

Hi All,

I would like to show unique records during any 5 minute period, where the Calltype and AgentID is the same. (find duplicate files are indicated by a 1).

The desired result is like below, where the rolling 5 minute time interval is considered. I'm struggling to think how I can find the necessary method to identify the duplicates.

CallType---AgentID---Time---Round5Min---Duplicate---
10000-----1215-----09:01:00-----09:00:00-----0-----
12000-----1215-----10:01:00-----10:00:00-----0-----
12000-----1215-----10:02:00-----10:00:00-----1-----
12000-----1215-----10:02:31-----10:05:00-----1-----
12000-----1215-----10:02:38-----10:05:00-----1-----
12000-----1215-----10:05:59-----10:05:00-----1-----

If I try to make a duplicate key from the Calltype, AgentId and time numbers, rounding to 5 minutes, I get the wrong behaviour - as seen below when the records round to 10:05:00.

CallType---AgentID---Time---Round5Min---Duplicate---
10000-----1215-----09:01:00-----09:00:00-----0-----
12000-----1215-----10:01:00-----10:00:00-----0-----
12000-----1215-----10:02:00-----10:00:00-----1-----
12000-----1215-----10:02:31-----10:05:00-----0-----
12000-----1215-----10:02:38-----10:05:00-----1-----
12000-----1215-----10:05:59-----10:05:00-----1-----

Same problem if I round up 5 the nearest mintues as seen below with the last row of data.

CallType---AgentID---Time---RoundUP5Min---Duplicate---
10000-----1215-----09:01:00-----09:00:00-----0-----
12000-----1215-----10:01:00-----10:00:00-----0-----
12000-----1215-----10:02:00-----10:00:00-----1-----
12000-----1215-----10:02:31-----10:05:00-----1-----
12000-----1215-----10:02:38-----10:05:00-----1-----
12000-----1215-----10:05:59-----10:10:00-----0-----


Any help on this appreciated.
Thanks



Thank you i was in the same problem and thank god i found this forum.
Go to Top of Page
   

- Advertisement -