Best way to map timespans to calendardays

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2012 Forums

Transact-SQL (2012)

Best way to map timespans to calendardays

Author

Topic

CleaningWoman
Starting Member

13 Posts

Posted - 2014-12-17 : 08:23:16

Hi SQL-Experts,

i am confronted with a problem which i guess is already solved, but i do not now how to search for it or the best way to solve it, so i registered in the hope to find some nerd-expert-answers.

I have:
about 10,000,000 entries [car_id], [starttime], [endtime] (between 2011 and today)

I need:
A calendar list from 2011 until today, which contains the daily time a car is being used and which [car_id] is involved for each day (might be multiple times a day).
[date_id] [car_id] [starttime] [endtime]

Due to bad DB design i have about 12,000 different [car_id] which i cannot reduce to a lower number.

So the expert-sql-question is:
Does anyone have a good script solution to map a timespan over different calendar days to a one by one list of daily timespans?

My current way would be to scan each [car_id] in time-order and break it down into days and insert the results into a new table or to scan day by day all entries with starttime on that day and process this list via a #tempdb
anyway it will result in about 50,000,000 inserts.

What do you suggest?

Thank you very much, for even reading until here and even more having an answer.

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-12-17 : 08:49:22

Would you please post some examples of input and output?

CleaningWoman
Starting Member

13 Posts

Posted - 2014-12-17 : 10:08:42

quote:
Originally posted by gbritton

Would you please post some examples of input and output?

Yes, of course

available source-data


car_id	starttime		endtime			duration
64	2013-05-23 13:36:42.000	2013-05-25 04:10:51.000	138849
103	2013-05-23 20:38:43.000	2013-05-25 03:40:55.000	111732
104	2013-05-23 14:37:41.000	2013-05-25 04:10:02.000	135141
171	2013-05-23 11:32:21.000	2013-05-25 04:07:20.000	146099
189	2013-05-23 16:33:27.000	2013-05-27 11:05:44.000	325937
3283	2013-05-23 17:12:55.000	2013-05-25 14:53:57.000	164462
3956	2013-05-23 09:49:20.000	2013-05-24 10:29:51.000	88831
4107	2013-05-23 07:20:20.000	2013-05-24 09:35:09.000	94489
5237	2013-05-23 13:16:20.000	2013-05-25 00:49:04.000	127964
13813	2013-05-23 12:47:57.000	2013-05-24 15:17:29.000	95372
14265	2013-05-23 11:24:06.000	2013-05-24 13:32:00.000	94074
30829	2013-05-23 08:16:23.000	2013-05-24 10:13:40.000	93437
30832	2013-05-23 05:58:36.000	2013-05-24 06:08:28.000	86992

Well, i do not have result data yet. I expect the first row to result in the following entries:


[car_id][date_id]	[starttime]	[endtime]
64	20130523 	13:36:42.000	23:59:59.000
64	20130524	00:00:00.000	23:59:59.000
64	20130525	00:00:00.000	04:10:51.000

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-12-17 : 11:15:47

Tricky! Check this out:


declare @t table (car_id int, starttime datetime, endtime datetime, duration int)
insert into @t(car_id, starttime, endtime, duration) values

(64	,'2013-05-23 13:36:42.000',	'2013-05-25 04:10:51.000',	138849	),
(103	,'2013-05-23 20:38:43.000',	'2013-05-25 03:40:55.000',	111732	),
(104	,'2013-05-23 14:37:41.000',	'2013-05-25 04:10:02.000',	135141	),
(171	,'2013-05-23 11:32:21.000',	'2013-05-25 04:07:20.000',	146099	),
(189	,'2013-05-23 16:33:27.000',	'2013-05-27 11:05:44.000',	325937	),
(3283	,'2013-05-23 17:12:55.000',	'2013-05-25 14:53:57.000',	164462	),
(3956	,'2013-05-23 09:49:20.000',	'2013-05-24 10:29:51.000',	88831	),
(4107	,'2013-05-23 07:20:20.000',	'2013-05-24 09:35:09.000',	94489	),
(5237	,'2013-05-23 13:16:20.000',	'2013-05-25 00:49:04.000',	127964	),
(13813	,'2013-05-23 12:47:57.000',	'2013-05-24 15:17:29.000',	95372	),
(14265	,'2013-05-23 11:24:06.000',	'2013-05-24 13:32:00.000',	94074	),
(30829	,'2013-05-23 08:16:23.000',	'2013-05-24 10:13:40.000',	93437	),
(30832	,'2013-05-23 05:58:36.000',	'2013-05-24 06:08:28.000',	86992	)

select car_id, dts.d, dts.s, dts.e from @t

cross apply (
	select cast(starttime as date) sd
	     , cast(starttime as time) st 
	     , cast(endtime as date)   ed
	     , cast(endtime as time)   et
	     , cast('23:59:59' as time)
             , cast('00:00:00' as time)) dt(sd,st,ed,et, _1159, _0000)

cross apply (
	select distinct dt, s, e from (values
		(dt.sd, dt.st, dt._1159),
		(dateadd(day, 1, dt.sd), dt._0000, dt._1159),
		(dateadd(day, 2, dt.sd), dt._0000, dt._1159),
		(dateadd(day, 3, dt.sd), dt._0000, dt._1159),
		(dateadd(day, 4, dt.sd), dt._0000, dt._1159),
		(dateadd(day, 5, dt.sd), dt._0000, dt._1159),
		(dateadd(day, 6, dt.sd), dt._0000, dt._1159),
		(dt.ed, dt._0000, dt.et)) d(dt, s, e)
	where  1 = case 
			when d.dt < dt.ed then 1
			when d.dt = dt.ed and d.e <= dt.et then 1
			else 0
			end
	) dts(d, s, e)

Edit: removed unused code. Note: if you want to cover any number of days, you'll need to build the query as dynamic SQL, for the last CROSS APPLY step.

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-12-17 : 12:18:01

Here's another way, using a CTE tally table. This should handle any interval up to 10^8 days"


declare @t table (car_id int, starttime datetime, endtime datetime, duration int)
insert into @t(car_id, starttime, endtime, duration) values

(64	,'2013-05-23 13:36:42.000',	'2013-05-25 04:10:51.000',	138849	),
(103	,'2013-05-23 20:38:43.000',	'2013-05-25 03:40:55.000',	111732	),
(104	,'2013-05-23 14:37:41.000',	'2013-05-25 04:10:02.000',	135141	),
(171	,'2013-05-23 11:32:21.000',	'2013-05-25 04:07:20.000',	146099	),
(189	,'2013-05-23 16:33:27.000',	'2013-05-27 11:05:44.000',	325937	),
(3283	,'2013-05-23 17:12:55.000',	'2013-05-25 14:53:57.000',	164462	),
(3956	,'2013-05-23 09:49:20.000',	'2013-05-24 10:29:51.000',	88831	),
(4107	,'2013-05-23 07:20:20.000',	'2013-05-24 09:35:09.000',	94489	),
(5237	,'2013-05-23 13:16:20.000',	'2013-05-25 00:49:04.000',	127964	),
(13813	,'2013-05-23 12:47:57.000',	'2013-05-24 15:17:29.000',	95372	),
(14265	,'2013-05-23 11:24:06.000',	'2013-05-24 13:32:00.000',	94074	),
(30829	,'2013-05-23 08:16:23.000',	'2013-05-24 10:13:40.000',	93437	),
(30832	,'2013-05-23 05:58:36.000',	'2013-05-24 06:08:28.000',	86992	)


declare @d int = 1+ (
    select max(datediff(day, cast(starttime as date), cast(endtime as date)))
    from @t);

with n1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))v(n)),
     n2(n) as (select 1 from n1, n1 n),
     n4(n) as (select 1 from n2, n2 n),
     n8(n) as (select 1 from n4, n4 n),
     N(n)  as (select top(@d) row_number() over(order by (select 1)) from n8, n8 n)

select  car_id, dateadd(day, n-1,dt.sd) startdate
    ,case when dt.sd = dateadd(day, n-1,dt.sd) then dt.st else dt._0000 end startime
    ,case when dt.ed = dateadd(day, n-1,dt.sd) then dt.et else dt._1159 end endtime
from @t
cross apply (
    select cast(starttime as date), cast(starttime as time),
           cast(endtime as date), cast(endtime as time),
           cast('23:59:59' as time), cast('00:00:00' as time)
    ) dt(sd, st, ed, et, _1159, _0000)
cross join N
where n -1 <= datediff(day, dt.sd, dt.ed)
order by car_id, startdate

CleaningWoman
Starting Member

13 Posts

Posted - 2014-12-17 : 12:26:24

quote:
Originally posted by gbritton

Tricky! Check this out:

WOW! THANX SO MUCH. I have learned so much right now. There has opened a whole new SQL-World for me with DTS. I did not know of DTS.

But there is a minor drawback with rows where the car is returned on the same day.

If you change above code the first row of data to:


(64	,'2013-05-23 13:36:42.000',	'2013-05-23 14:10:51.000',	2049	),

the returned results leave the first day with a wrong starttime from 00:00:00

I guess it is a minor typo, but as i do not understand what is going on, i will and have to train myself first to find it. Do you see it right away ?

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-12-17 : 13:08:58

My second solution fixes that

CleaningWoman
Starting Member

13 Posts

Posted - 2014-12-17 : 13:13:01

quote:
Originally posted by gbritton

My second solution fixes that

YES, it does !!
I already implemented your solution into my database and processed all data in 1 minute and 6 seconds .

I will go asleep tonight a different person.
And tomorrow i will use the saved time to understand what you have made me do. ;-)

THANX, THANX, THANX.

Muj9
Yak Posting Veteran

75 Posts

Posted - 2014-12-18 : 07:10:42

your a sql beast "gbritton".

Can you walk us through your sql ?

what do all the n's mean?

Thanks

gbritton
Master Smack Fu Yak Hacker

2780 Posts

Posted - 2014-12-18 : 09:04:42

@Muj9 Ha! I claim no credit here. All the n's are a compact form of a CTE-based tally table. Jeff Moden wrote a great article on that and how to use a tally table to parse strings:

http://www.sqlservercentral.com/articles/Tally+Table/72993/

The basic idea is to build up a sequence of enough numbers to cover what you want to do. Looking at the CTEs again:


with n1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))v(n)),
     n2(n) as (select 1 from n1, n1 n),
     n4(n) as (select 1 from n2, n2 n),
     n8(n) as (select 1 from n4, n4 n),
     N(n)  as (select top(@d) row_number() over(order by (select 1)) from n8, n8 n)

n1 is just 10 rows of the number 1. I call it n1 since there are 10^1 = 10 rows.
n2 is n1*n1 -- that is, the cartesian product of n1 with itself. Call it n1^2 if you like. Now we have 10^1 * 10^1 = 10^2 = 100 rows
n4 continues the idea, resulting in 10^4 = 10,000 rows
n8 has 10^8 rows

Finally we come to big N. I use N since it reminds me of the double-struck N, the symbol used to represent the natural numbers in mathematics (see http://en.wikipedia.org/wiki/Natural_number). This CTE selects some of the rows of n8. In this case the number of rows is controlled by the number of days (inclusive) between the earliest start date and latest end date. (In practice I suspect this will be (a lot!) shorter than 10^8). This CTE uses the SQL ROW_NUMBER function to generate the sequence we want. ORDER BY is required and ORDER BY (SELECT 1) is equivalent to no ordering. You could say (SELECT NULL) or (SELECT 0) or (SELECT $) or (SELECT 'UNORDERED') with the same result.

Continuing on, the first CROSS APPLY:


cross apply (
    select cast(starttime as date), cast(starttime as time),
           cast(endtime as date), cast(endtime as time),
           cast('23:59:59' as time), cast('00:00:00' as time)
    ) dt(sd, st, ed, et, _1159, _0000)

just does typecasts and aliases the results for easy reference. Note that using CROSS APPLY this way, the optimizer will do the work in-line and only adds a minuscule constant factor to the overall cost. However, it makes the main SELECT clause simpler to write.

The CROSS JOIN just ensures that I have enough sequence numbers for each row of input. (So I can cover each day).

The main SELECT clause:


select  car_id, dateadd(day, n-1,dt.sd) startdate
    ,case when dt.sd = dateadd(day, n-1,dt.sd) then dt.st else dt._0000 end startime
    ,case when dt.ed = dateadd(day, n-1,dt.sd) then dt.et else dt._1159 end endtime

gets the car id and start date. Then it computes the start and end times. The idea is that if this is day 1 (dt.sd = dateadd(day, n-1,dt.sd)), the start time comes from the input data, otherwise it is midnight (00:00:00). Same idea with the end time.

PS: I just realized that I can simplify this even further and make it DRY-er by reordering the CROSS JOIN and CROSS APPLY and pre-computing dateadd(day, n-1,dt.sd):


select  car_id, day_n startdate
    ,case when dt.sd = day_n then dt.st else dt._0000 end startime
    ,case when dt.ed = day_n then dt.et else dt._1159 end endtime
from @t
cross join N
cross apply (
    select cast(starttime as date), cast(starttime as time),
           cast(endtime as date), cast(endtime as time),
           cast('23:59:59' as time), cast('00:00:00' as time),
	   cast(dateadd(day, n-1,starttime) as date) 
    ) dt(sd, st, ed, et, _1159, _0000, day_n)

where n -1 <= datediff(day, dt.sd, dt.ed)
order by car_id, startdate

Muj9
Yak Posting Veteran

75 Posts

Posted - 2014-12-18 : 09:23:25

Thank you for the explanation its very impressive. Just Brilliant.

CleaningWoman
Starting Member

13 Posts

Posted - 2015-01-08 : 10:22:45

quote:
Originally posted by gbritton

My second solution

Thank you again. I studied a while on it, i tested a few different szenarios and learned very much. The routine is implemented and working stable and fine.

And i will build a training about "CTE" and "cross apply" out of it for my colleagues.

Thanks a lot.
(I was of on x-mas holidays and now as i am back, otherwise i would have answered earlier)

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2015-01-08 : 12:45:23

Keep it simple. I expect this code to run much faster than 66 seconds.

-- SwePeso
SELECT		t.Car_ID,
		DATEADD(DAY, DATEDIFF(DAY, '19000101', t.StartTime), v.Number) AS Date_ID,
		CASE
			WHEN v.Number = 0 THEN CAST(t.StartTime AS TIME(3))
			ELSE CAST('00:00:00.000' AS TIME(3))
		END AS StartTime,
		CASE
			WHEN v.Number = DATEDIFF(DAY, t.StartTime, t.EndTime) THEN CAST(t.EndTime AS TIME(3))
			ELSE CAST('23:59:59.000' AS TIME(3))
		END AS EndTime
FROM		@t AS t
INNER JOIN	master.dbo.spt_values AS v ON v.Type = N'P'
			AND v.Number BETWEEN 0 AND DATEDIFF(DAY, t.StartTime, t.EndTime);

Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2015-01-08 : 12:59:58

Even faster.

First create this function in your database.

CREATE FUNCTION [dbo].[GetNums](@low AS BIGINT, @high AS BIGINT) RETURNS TABLE
AS
RETURN
  WITH
    L0   AS (SELECT c FROM (SELECT 1 UNION ALL SELECT 1) AS D(c)),
    L1   AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
    L2   AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
    L3   AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
    L4   AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
    L5   AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
    Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
            FROM L5)
  SELECT TOP(@high - @low + 1) @low + rownum - 1 AS n
  FROM Nums
  ORDER BY rownum;
GO

And then run this code

-- SwePeso
SELECT		t.Car_ID,
		DATEADD(DAY, DATEDIFF(DAY, '19000101', t.StartTime), v.n) AS Date_ID,
		CASE
			WHEN v.n = 0 THEN CAST(t.StartTime AS TIME(3))
			ELSE CAST('00:00:00.000' AS TIME(3))
		END AS StartTime,
		CASE
			WHEN v.n = DATEDIFF(DAY, t.StartTime, t.EndTime) THEN CAST(t.EndTime AS TIME(3))
			ELSE CAST('23:59:59.000' AS TIME(3))
		END AS EndTime
FROM		@t AS t
CROSS APPLY	dbo.GetNums(0, DATEDIFF(DAY, t.StartTime, t.EndTime)) AS v;

Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources