Author |
Topic |
CleaningWoman
Starting Member
13 Posts |
Posted - 2014-12-17 : 08:23:16
|
Hi SQL-Experts,i am confronted with a problem which i guess is already solved, but i do not now how to search for it or the best way to solve it, so i registered in the hope to find some nerd-expert-answers.I have: about 10,000,000 entries [car_id], [starttime], [endtime] (between 2011 and today) I need:A calendar list from 2011 until today, which contains the daily time a car is being used and which [car_id] is involved for each day (might be multiple times a day).[date_id] [car_id] [starttime] [endtime]Due to bad DB design i have about 12,000 different [car_id] which i cannot reduce to a lower number.So the expert-sql-question is:Does anyone have a good script solution to map a timespan over different calendar days to a one by one list of daily timespans?My current way would be to scan each [car_id] in time-order and break it down into days and insert the results into a new table or to scan day by day all entries with starttime on that day and process this list via a #tempdbanyway it will result in about 50,000,000 inserts.What do you suggest?Thank you very much, for even reading until here and even more having an answer. |
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-12-17 : 08:49:22
|
Would you please post some examples of input and output? |
|
|
CleaningWoman
Starting Member
13 Posts |
Posted - 2014-12-17 : 10:08:42
|
quote: Originally posted by gbritton Would you please post some examples of input and output?
Yes, of courseavailable source-datacar_id starttime endtime duration64 2013-05-23 13:36:42.000 2013-05-25 04:10:51.000 138849103 2013-05-23 20:38:43.000 2013-05-25 03:40:55.000 111732104 2013-05-23 14:37:41.000 2013-05-25 04:10:02.000 135141171 2013-05-23 11:32:21.000 2013-05-25 04:07:20.000 146099189 2013-05-23 16:33:27.000 2013-05-27 11:05:44.000 3259373283 2013-05-23 17:12:55.000 2013-05-25 14:53:57.000 1644623956 2013-05-23 09:49:20.000 2013-05-24 10:29:51.000 888314107 2013-05-23 07:20:20.000 2013-05-24 09:35:09.000 944895237 2013-05-23 13:16:20.000 2013-05-25 00:49:04.000 12796413813 2013-05-23 12:47:57.000 2013-05-24 15:17:29.000 9537214265 2013-05-23 11:24:06.000 2013-05-24 13:32:00.000 9407430829 2013-05-23 08:16:23.000 2013-05-24 10:13:40.000 9343730832 2013-05-23 05:58:36.000 2013-05-24 06:08:28.000 86992 Well, i do not have result data yet. I expect the first row to result in the following entries:[car_id][date_id] [starttime] [endtime]64 20130523 13:36:42.000 23:59:59.00064 20130524 00:00:00.000 23:59:59.00064 20130525 00:00:00.000 04:10:51.000 |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-12-17 : 11:15:47
|
Tricky! Check this out:declare @t table (car_id int, starttime datetime, endtime datetime, duration int)insert into @t(car_id, starttime, endtime, duration) values(64 ,'2013-05-23 13:36:42.000', '2013-05-25 04:10:51.000', 138849 ),(103 ,'2013-05-23 20:38:43.000', '2013-05-25 03:40:55.000', 111732 ),(104 ,'2013-05-23 14:37:41.000', '2013-05-25 04:10:02.000', 135141 ),(171 ,'2013-05-23 11:32:21.000', '2013-05-25 04:07:20.000', 146099 ),(189 ,'2013-05-23 16:33:27.000', '2013-05-27 11:05:44.000', 325937 ),(3283 ,'2013-05-23 17:12:55.000', '2013-05-25 14:53:57.000', 164462 ),(3956 ,'2013-05-23 09:49:20.000', '2013-05-24 10:29:51.000', 88831 ),(4107 ,'2013-05-23 07:20:20.000', '2013-05-24 09:35:09.000', 94489 ),(5237 ,'2013-05-23 13:16:20.000', '2013-05-25 00:49:04.000', 127964 ),(13813 ,'2013-05-23 12:47:57.000', '2013-05-24 15:17:29.000', 95372 ),(14265 ,'2013-05-23 11:24:06.000', '2013-05-24 13:32:00.000', 94074 ),(30829 ,'2013-05-23 08:16:23.000', '2013-05-24 10:13:40.000', 93437 ),(30832 ,'2013-05-23 05:58:36.000', '2013-05-24 06:08:28.000', 86992 )select car_id, dts.d, dts.s, dts.e from @tcross apply ( select cast(starttime as date) sd , cast(starttime as time) st , cast(endtime as date) ed , cast(endtime as time) et , cast('23:59:59' as time) , cast('00:00:00' as time)) dt(sd,st,ed,et, _1159, _0000)cross apply ( select distinct dt, s, e from (values (dt.sd, dt.st, dt._1159), (dateadd(day, 1, dt.sd), dt._0000, dt._1159), (dateadd(day, 2, dt.sd), dt._0000, dt._1159), (dateadd(day, 3, dt.sd), dt._0000, dt._1159), (dateadd(day, 4, dt.sd), dt._0000, dt._1159), (dateadd(day, 5, dt.sd), dt._0000, dt._1159), (dateadd(day, 6, dt.sd), dt._0000, dt._1159), (dt.ed, dt._0000, dt.et)) d(dt, s, e) where 1 = case when d.dt < dt.ed then 1 when d.dt = dt.ed and d.e <= dt.et then 1 else 0 end ) dts(d, s, e) Edit: removed unused code. Note: if you want to cover any number of days, you'll need to build the query as dynamic SQL, for the last CROSS APPLY step. |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-12-17 : 12:18:01
|
Here's another way, using a CTE tally table. This should handle any interval up to 10^8 days"declare @t table (car_id int, starttime datetime, endtime datetime, duration int)insert into @t(car_id, starttime, endtime, duration) values(64 ,'2013-05-23 13:36:42.000', '2013-05-25 04:10:51.000', 138849 ),(103 ,'2013-05-23 20:38:43.000', '2013-05-25 03:40:55.000', 111732 ),(104 ,'2013-05-23 14:37:41.000', '2013-05-25 04:10:02.000', 135141 ),(171 ,'2013-05-23 11:32:21.000', '2013-05-25 04:07:20.000', 146099 ),(189 ,'2013-05-23 16:33:27.000', '2013-05-27 11:05:44.000', 325937 ),(3283 ,'2013-05-23 17:12:55.000', '2013-05-25 14:53:57.000', 164462 ),(3956 ,'2013-05-23 09:49:20.000', '2013-05-24 10:29:51.000', 88831 ),(4107 ,'2013-05-23 07:20:20.000', '2013-05-24 09:35:09.000', 94489 ),(5237 ,'2013-05-23 13:16:20.000', '2013-05-25 00:49:04.000', 127964 ),(13813 ,'2013-05-23 12:47:57.000', '2013-05-24 15:17:29.000', 95372 ),(14265 ,'2013-05-23 11:24:06.000', '2013-05-24 13:32:00.000', 94074 ),(30829 ,'2013-05-23 08:16:23.000', '2013-05-24 10:13:40.000', 93437 ),(30832 ,'2013-05-23 05:58:36.000', '2013-05-24 06:08:28.000', 86992 )declare @d int = 1+ ( select max(datediff(day, cast(starttime as date), cast(endtime as date))) from @t);with n1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))v(n)), n2(n) as (select 1 from n1, n1 n), n4(n) as (select 1 from n2, n2 n), n8(n) as (select 1 from n4, n4 n), N(n) as (select top(@d) row_number() over(order by (select 1)) from n8, n8 n)select car_id, dateadd(day, n-1,dt.sd) startdate ,case when dt.sd = dateadd(day, n-1,dt.sd) then dt.st else dt._0000 end startime ,case when dt.ed = dateadd(day, n-1,dt.sd) then dt.et else dt._1159 end endtimefrom @tcross apply ( select cast(starttime as date), cast(starttime as time), cast(endtime as date), cast(endtime as time), cast('23:59:59' as time), cast('00:00:00' as time) ) dt(sd, st, ed, et, _1159, _0000)cross join Nwhere n -1 <= datediff(day, dt.sd, dt.ed)order by car_id, startdate |
|
|
CleaningWoman
Starting Member
13 Posts |
Posted - 2014-12-17 : 12:26:24
|
quote: Originally posted by gbritton Tricky! Check this out:
WOW! THANX SO MUCH. I have learned so much right now. There has opened a whole new SQL-World for me with DTS. I did not know of DTS.But there is a minor drawback with rows where the car is returned on the same day.If you change above code the first row of data to:(64 ,'2013-05-23 13:36:42.000', '2013-05-23 14:10:51.000', 2049 ), the returned results leave the first day with a wrong starttime from 00:00:00 I guess it is a minor typo, but as i do not understand what is going on, i will and have to train myself first to find it. Do you see it right away ? |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-12-17 : 13:08:58
|
My second solution fixes that |
|
|
CleaningWoman
Starting Member
13 Posts |
Posted - 2014-12-17 : 13:13:01
|
quote: Originally posted by gbritton My second solution fixes that
YES, it does !!I already implemented your solution into my database and processed all data in 1 minute and 6 seconds .I will go asleep tonight a different person. And tomorrow i will use the saved time to understand what you have made me do. ;-)THANX, THANX, THANX. |
|
|
Muj9
Yak Posting Veteran
75 Posts |
Posted - 2014-12-18 : 07:10:42
|
your a sql beast "gbritton".Can you walk us through your sql ?what do all the n's mean?Thanks |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-12-18 : 09:04:42
|
@Muj9 Ha! I claim no credit here. All the n's are a compact form of a CTE-based tally table. Jeff Moden wrote a great article on that and how to use a tally table to parse strings:http://www.sqlservercentral.com/articles/Tally+Table/72993/The basic idea is to build up a sequence of enough numbers to cover what you want to do. Looking at the CTEs again:with n1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))v(n)), n2(n) as (select 1 from n1, n1 n), n4(n) as (select 1 from n2, n2 n), n8(n) as (select 1 from n4, n4 n), N(n) as (select top(@d) row_number() over(order by (select 1)) from n8, n8 n) n1 is just 10 rows of the number 1. I call it n1 since there are 10^1 = 10 rows.n2 is n1*n1 -- that is, the cartesian product of n1 with itself. Call it n1^2 if you like. Now we have 10^1 * 10^1 = 10^2 = 100 rowsn4 continues the idea, resulting in 10^4 = 10,000 rowsn8 has 10^8 rowsFinally we come to big N. I use N since it reminds me of the double-struck N, the symbol used to represent the natural numbers in mathematics (see http://en.wikipedia.org/wiki/Natural_number). This CTE selects some of the rows of n8. In this case the number of rows is controlled by the number of days (inclusive) between the earliest start date and latest end date. (In practice I suspect this will be (a lot!) shorter than 10^8). This CTE uses the SQL ROW_NUMBER function to generate the sequence we want. ORDER BY is required and ORDER BY (SELECT 1) is equivalent to no ordering. You could say (SELECT NULL) or (SELECT 0) or (SELECT $) or (SELECT 'UNORDERED') with the same result. Continuing on, the first CROSS APPLY:cross apply ( select cast(starttime as date), cast(starttime as time), cast(endtime as date), cast(endtime as time), cast('23:59:59' as time), cast('00:00:00' as time) ) dt(sd, st, ed, et, _1159, _0000) just does typecasts and aliases the results for easy reference. Note that using CROSS APPLY this way, the optimizer will do the work in-line and only adds a minuscule constant factor to the overall cost. However, it makes the main SELECT clause simpler to write.The CROSS JOIN just ensures that I have enough sequence numbers for each row of input. (So I can cover each day).The main SELECT clause:select car_id, dateadd(day, n-1,dt.sd) startdate ,case when dt.sd = dateadd(day, n-1,dt.sd) then dt.st else dt._0000 end startime ,case when dt.ed = dateadd(day, n-1,dt.sd) then dt.et else dt._1159 end endtime gets the car id and start date. Then it computes the start and end times. The idea is that if this is day 1 (dt.sd = dateadd(day, n-1,dt.sd)), the start time comes from the input data, otherwise it is midnight (00:00:00). Same idea with the end time.PS: I just realized that I can simplify this even further and make it DRY-er by reordering the CROSS JOIN and CROSS APPLY and pre-computing dateadd(day, n-1,dt.sd):select car_id, day_n startdate ,case when dt.sd = day_n then dt.st else dt._0000 end startime ,case when dt.ed = day_n then dt.et else dt._1159 end endtimefrom @tcross join Ncross apply ( select cast(starttime as date), cast(starttime as time), cast(endtime as date), cast(endtime as time), cast('23:59:59' as time), cast('00:00:00' as time), cast(dateadd(day, n-1,starttime) as date) ) dt(sd, st, ed, et, _1159, _0000, day_n)where n -1 <= datediff(day, dt.sd, dt.ed)order by car_id, startdate |
|
|
Muj9
Yak Posting Veteran
75 Posts |
Posted - 2014-12-18 : 09:23:25
|
Thank you for the explanation its very impressive. Just Brilliant. |
|
|
CleaningWoman
Starting Member
13 Posts |
Posted - 2015-01-08 : 10:22:45
|
quote: Originally posted by gbritton My second solution
Thank you again. I studied a while on it, i tested a few different szenarios and learned very much. The routine is implemented and working stable and fine.And i will build a training about "CTE" and "cross apply" out of it for my colleagues. Thanks a lot. (I was of on x-mas holidays and now as i am back, otherwise i would have answered earlier) |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2015-01-08 : 12:45:23
|
Keep it simple. I expect this code to run much faster than 66 seconds.-- SwePesoSELECT t.Car_ID, DATEADD(DAY, DATEDIFF(DAY, '19000101', t.StartTime), v.Number) AS Date_ID, CASE WHEN v.Number = 0 THEN CAST(t.StartTime AS TIME(3)) ELSE CAST('00:00:00.000' AS TIME(3)) END AS StartTime, CASE WHEN v.Number = DATEDIFF(DAY, t.StartTime, t.EndTime) THEN CAST(t.EndTime AS TIME(3)) ELSE CAST('23:59:59.000' AS TIME(3)) END AS EndTimeFROM @t AS tINNER JOIN master.dbo.spt_values AS v ON v.Type = N'P' AND v.Number BETWEEN 0 AND DATEDIFF(DAY, t.StartTime, t.EndTime); Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2015-01-08 : 12:59:58
|
Even faster. First create this function in your database.CREATE FUNCTION [dbo].[GetNums](@low AS BIGINT, @high AS BIGINT) RETURNS TABLEASRETURN WITH L0 AS (SELECT c FROM (SELECT 1 UNION ALL SELECT 1) AS D(c)), L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum FROM L5) SELECT TOP(@high - @low + 1) @low + rownum - 1 AS n FROM Nums ORDER BY rownum;GO And then run this code-- SwePesoSELECT t.Car_ID, DATEADD(DAY, DATEDIFF(DAY, '19000101', t.StartTime), v.n) AS Date_ID, CASE WHEN v.n = 0 THEN CAST(t.StartTime AS TIME(3)) ELSE CAST('00:00:00.000' AS TIME(3)) END AS StartTime, CASE WHEN v.n = DATEDIFF(DAY, t.StartTime, t.EndTime) THEN CAST(t.EndTime AS TIME(3)) ELSE CAST('23:59:59.000' AS TIME(3)) END AS EndTimeFROM @t AS tCROSS APPLY dbo.GetNums(0, DATEDIFF(DAY, t.StartTime, t.EndTime)) AS v; Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA |
|
|
|
|
|