Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
QuentinL
Starting Member
11 Posts |
Posted - 2013-04-03 : 12:27:16
|
Hi everyone ,I think this one is a tough one.I need to know how many user where logged during a time period based on a activity tableIn the activity table the user’s status is stored with a timestamp. Here is the TableUserID | State | TimeStamp1 | 4 | 2013-03-29 09:00:391 | 3 | 2013-03-29 12:00:002 | 4 | 2013-03-29 10:30:002 | 3 | 2013-03-29 12:00:00State = 4 means User has logged in and 3 means agent has logged outFor example between 10h and 11h the result would be 1.5 because “User 2” logged in the middle of this interval and the user 1 was logged in during the whole period.So for each user I have to sum the time between the logged in state and the logged out state then to divide it by the given period in order to get the user activity for this period, and then make the sum for all users…Considering that the user status may have or not change during the given interval, I guess I should work with rows for the given interval plus the first row just before the interval because I need to know the state of the user before the beginning of the interval.Am I clear?!? Does this sound possible to you?NB : I've uploaded a csv file containing sample data to give you a "quick" start if needed http://www.sendspace.com/file/tcvy6vThank you for your help!Regards,Quentin |
|
UnemployedInOz
Yak Posting Veteran
54 Posts |
Posted - 2013-04-03 : 20:09:07
|
Create Table #TEST(UserID INT,State INT,TimeStamp DATETIME)-- truncate table #TestiNSERT INTO #TEST VALUES (1, 4, '2013-03-29 09:00:39'), (1, 3, '2013-03-29 12:00:00'), (2, 4, '2013-03-29 10:30:00'), (2, 3, '2013-03-29 12:00:00'), (1, 4, '2013-04-29 09:00:39'), (1, 3, '2013-04-29 12:00:00'), (3, 4, '2013-03-29 11:00:39')Declare @RangeFrom datetime, @RangeTo datetimeSELECT @RangeFrom = '2013-03-29 10:00:00.000', @RangeTo = '2013-03-29 11:00:00.000';With src as (SELECT a.UserID,a.TimeStamp as TFrom, ISNULL((Select min(b.TimeStamp) from #test b where a.UserID = b.UserID and a.TimeStamp <= b.TimeStamp and b.State = 3 and b.TimeStamp >= @RangeFrom),@RangeTo) as TToFROM #TEST a WHERE State = 4 and a.TimeStamp < @RangeTo)SELECT SUM(Datediff(s,TFrom2,TTo2))/3600.0 FROM ( Select UserID, case when TFrom < @RangeFrom then @RangeFrom else TFrom end TFrom2, case when TTo > @RangeTo then @RangeTo else TTo end TTo2 from src) as z; |
|
|
QuentinL
Starting Member
11 Posts |
Posted - 2013-04-03 : 21:13:10
|
Wahoow !!!Your query looks very good!Tough for me to understand it...It works great with the test values, but looks strange on my real data...I've just finished my dirty query using many temporary tables, my resultseems consistents but quite different from your query.It's more than 3am now for me, so I won't go further for today.I'll give another try tomorrow to understand your query and the results it gives, and probably post here my dirty query when I'll have added some comments.Anyways, big thanks to you!Hope you'll be around there tomorrow Quentin |
|
|
UnemployedInOz
Yak Posting Veteran
54 Posts |
Posted - 2013-04-03 : 21:43:20
|
I start off by creating rows of information consisting of UserID, StartTime and EndTime together.To find the Endtime that goes with the Starttime, you look for the first Endtime for that user after the starttime. Hence the min(b.TimeStamp). If there is no endtime in your table, we assume that the user is still connected i.e. ISNULL(min(b.TimeStamp),@RangeTo) and we use the @RangeTo instead.Once we have the rows of information which is created within the WITH statement and allocated to table name src we can use this information. If the starttime is before the @RangeFrom time we then use the @RangeFrom time and not the startTime becuase we are only interest in this @RangeFrom period. By the same token, if the @RangeTo is before the EndTime, we then want to use the @RangeTo time as we are not interested in anything after that. Therefore with the new times I find the number of seconds between the two times and sum them up. I then convert back to hours to get the answer. |
|
|
QuentinL
Starting Member
11 Posts |
Posted - 2013-04-04 : 08:43:35
|
Thanks for your explanations, it helped me get a better understanding of your query.If think the problem occurs when I have previous rows from previous daysLet’s use to following values about only 1 user over a few daysUsing the following range:SELECT @RangeFrom = '2013-03-29 10:00:00.000',@RangeTo = '2013-03-29 11:30:00.000';Which is more than 1h; I replace the division by 3600.0 by CONVERT(float,DATEDIFF(s,@RangeFrom,@RangeTo)), casting it as a float in order to get the result as a floatCreate Table #TEST(UserID INT,State INT,TimeStamp DATETIME)-- truncate table #TestiNSERT INTO #TEST VALUES (1, 3, '2013-03-27 00:00:01'),(1, 4, '2013-03-27 08:33:22'),(1, 3, '2013-03-27 12:11:26'),(1, 4, '2013-03-27 14:07:16'),(1, 3, '2013-03-27 18:00:45'),(1, 3, '2013-03-28 00:00:02'),(1, 4, '2013-03-28 08:04:55'),(1, 3, '2013-03-28 12:03:41'),(1, 4, '2013-03-28 13:23:05'),(1, 3, '2013-03-28 18:04:13'),(1, 3, '2013-03-29 00:00:02'),(1, 4, '2013-03-29 08:10:32'),(1, 3, '2013-03-29 10:10:32'),(1, 4, '2013-03-29 10:40:32'),(1, 3, '2013-03-29 11:51:03'),(1, 4, '2013-03-29 12:54:10'),(1, 3, '2013-03-29 17:47:47')The nested Select:With src as (SELECT a.UserID, a.TimeStamp as TFrom, ISNULL((Select min(b.TimeStamp) from #test b where a.UserID = b.UserID and a.TimeStamp <= b.TimeStamp and b.State = 3 and b.TimeStamp >= @RangeFrom),@RangeTo) as TTo FROM #TEST a WHERE State = 4 and a.TimeStamp < @RangeTo)SELECT * FROM SRCGives the following:1 2013-03-27 08:33:22.000 2013-03-29 10:10:32.0001 2013-03-27 14:07:16.000 2013-03-29 10:10:32.0001 2013-03-28 08:04:55.000 2013-03-29 10:10:32.0001 2013-03-28 13:23:05.000 2013-03-29 10:10:32.0001 2013-03-29 08:10:32.000 2013-03-29 10:10:32.0001 2013-03-29 10:40:32.000 2013-03-29 11:51:03.000Then we correct the TFrom and the TToWith src as (SELECT a.UserID,a.TimeStamp as TFrom, ISNULL((Select min(b.TimeStamp) from #test b where a.UserID = b.UserID and a.TimeStamp <= b.TimeStamp and b.State = 3 and b.TimeStamp >= @RangeFrom),@RangeTo) as TToFROM #TEST a WHERE State = 4 and a.TimeStamp < @RangeTo)Select UserID, case when TFrom < @RangeFrom then @RangeFrom else TFrom end TFrom2, case when TTo > @RangeTo then @RangeTo else TTo end TTo2 from srcThis gives the following:1 2013-03-29 10:00:00.000 2013-03-29 10:10:32.0001 2013-03-29 10:00:00.000 2013-03-29 10:10:32.0001 2013-03-29 10:00:00.000 2013-03-29 10:10:32.0001 2013-03-29 10:00:00.000 2013-03-29 10:10:32.0001 2013-03-29 10:00:00.000 2013-03-29 10:10:32.0001 2013-03-29 10:40:32.000 2013-03-29 11:30:00.000The final result is 1.13…. When the value for 1 user cannot be greater than 1 The correct Result would be obtained using the two last Rows1 2013-03-29 10:00:00.000 2013-03-29 10:10:32.0001 2013-03-29 10:40:32.000 2013-03-29 11:30:00.000Which would return 0.66…. which looks Ok as the user was connected around 60min over a 90min rangeTo work properly we need the first row before the @RangeFrom to determine to initial state of the user.So I guess min (b.TimeStamp) would need an extra condition on the where clause to limit the rows returned by the nested select statement…I really appreciate your help !!Quentin |
|
|
UnemployedInOz
Yak Posting Veteran
54 Posts |
Posted - 2013-04-04 : 17:52:16
|
With src as (SELECT a.UserID,a.TimeStamp as TFrom,ISNULL((Select min(b.TimeStamp) from #test b where a.UserID = b.UserID and a.TimeStamp <= b.TimeStamp and b.State = 3),@RangeTo) as TToFROM #TEST a WHERE State = 4 and a.TimeStamp < @RangeTo)SELECT SUM(Datediff(s,TFrom2,TTo2))/3600.0 FROM (Select UserID,case when TFrom < @RangeFrom then @RangeFrom else TFrom end TFrom2,case when TTo > @RangeTo then @RangeTo else TTo end TTo2from src) as z WHERE TFrom2 < TTo2; |
|
|
QuentinL
Starting Member
11 Posts |
Posted - 2013-04-05 : 07:41:39
|
Fantastic !!Your query gives the same result as my dirty nested cursor/fetch 200 lines query If you agree let me take my need one step further.I would like to have a GROUP BY clause grouping result by hour or half hour...I already used something like Select Count(*), DateAdd(Minute, 30 * (DateDiff(Minute, 0, SomeDate) / 30), 0)From #TempGroup By DateAdd(Minute, 30 * (DateDiff(Minute, 0, SomeDate) / 30), 0)in a previous query to count rows by half hour...So focusing on the 03/29 the result I would like would be...|8h-8h30 | 8h30-9h | 9h-9h30 | 9h30-10h | 10h-10h30 | 0.648...| 1 | 1 | 1 | 0.648...I could use a while loop and process your request 48 times to get the job done...Have you got a better idea?Thank you for everything !!Quentin |
|
|
UnemployedInOz
Yak Posting Veteran
54 Posts |
Posted - 2013-04-05 : 18:11:35
|
quote: Originally posted by QuentinL Fantastic !!So focusing on the 03/29 the result I would like would be...|8h-8h30 | 8h30-9h | 9h-9h30 | 9h30-10h | 10h-10h30 | 0.648...| 1 | 1 | 1 | 0.648...Quentin
These results are double what they should be...Create Table #TEST(UserID INT,State INT,TimeStamp DATETIME)Create Table #Range(RangeFrom datetime,RangeTo datetime)-- truncate table #Test-- truncate table #RangeiNSERT INTO #TEST VALUES (1, 3, '2013-03-27 00:00:01'),(1, 4, '2013-03-27 08:33:22'),(1, 3, '2013-03-27 12:11:26'),(1, 4, '2013-03-27 14:07:16'),(1, 3, '2013-03-27 18:00:45'),(1, 3, '2013-03-28 00:00:02'),(1, 4, '2013-03-28 08:04:55'),(1, 3, '2013-03-28 12:03:41'),(1, 4, '2013-03-28 13:23:05'),(1, 3, '2013-03-28 18:04:13'),(1, 3, '2013-03-29 00:00:02'),(1, 4, '2013-03-29 08:10:32'),(1, 3, '2013-03-29 10:10:32'),(1, 4, '2013-03-29 10:40:32'),(1, 3, '2013-03-29 11:51:03'),(1, 4, '2013-03-29 12:54:10'),(1, 3, '2013-03-29 17:47:47')Declare @RangeFrom datetime,@RangeTo datetimeSELECT @RangeFrom = '2013-03-29 08:00:00.000',@RangeTo = '2013-03-29 23:30:00.000';While @RangeFrom < @RangetoBEGIN INSERT INto #Range Values (@RangeFrom,DateAdd(mi,30,@RangeFrom)) SELECT @RangeFrom = DateAdd(mi,30,@RangeFrom)END;-- select * from #test-- select * from #RangeWith src as (SELECT RangeFrom,RangeTo,a.TimeStamp as TFrom, ISNULL((Select min(b.TimeStamp) from #test b where a.UserID = b.UserID and a.TimeStamp <= b.TimeStamp and b.State = 3),RangeTo) as TTo FROM #TEST a CROSS JOIN #Range WHERE State = 4 and a.TimeStamp < RangeTo)SELECT RangeFrom,RangeTo,SUM(Datediff(s,TFrom2,TTo2))/3600.0 FROM (Select RangeFrom,RangeTo,case when TFrom < RangeFrom then RangeFrom else TFrom end TFrom2,case when TTo > RangeTo then RangeTo else TTo end TTo2from src) as z WHERE TFrom2 < TTo2Group By RangeFrom,RangeTo; |
|
|
QuentinL
Starting Member
11 Posts |
Posted - 2013-04-06 : 20:21:20
|
Great!!I made a little mistake in the last message about the expected results.The correct ones are:2013-03-29 08:00:00.000 2013-03-29 08:30:00.000 0.6488882013-03-29 08:30:00.000 2013-03-29 09:00:00.000 1.0000002013-03-29 09:00:00.000 2013-03-29 09:30:00.000 1.0000002013-03-29 09:30:00.000 2013-03-29 10:00:00.000 1.0000002013-03-29 10:00:00.000 2013-03-29 10:30:00.000 0.3511112013-03-29 10:30:00.000 2013-03-29 11:00:00.000 0.648888The result is not doubled as they need to be divided by 1800.0 instead of 3600.0 Anyway your query is just great, I wouldn't have thought of making a JOIN on a table containing the half hour ranges.I we can go further, I'm trying to optimize the query.I think that replacing the temp table by the following should help:Declare @RangeFrom datetime,@RangeTo datetimeSELECT @RangeFrom = '2013-03-25 08:00:00.000',@RangeTo = '2013-03-29 23:30:00.000';;With RangeByHalfHours As ( Select @RangeFrom As RangeFrom,Dateadd(mi,30,@RangeFrom) As RangeTo Union All Select DateAdd(mi,30,RangeFrom),DateAdd(mi,30,RangeTo) From RangeByHalfHours Where RangeTo < @RangeTo ) select * from RangeByHalfHours Option ( Maxrecursion 0);Is it a good idea? If yes let me explain some limitations I experienceThe following works because of Maxrecursion 0, if not present there shouldn't be more than 100 half hour ranges...In the full query below I couldn't find where to put the Maxrecursion option.;With RangeByHalfHours As ( Select @FromDate As RangeFrom,Dateadd(mi,30,@FromDate) As RangeTo Union All Select DateAdd(mi,30,RangeFrom),DateAdd(mi,30,RangeTo) From RangeByHalfHours Where RangeFrom < @ToDate ),src as (SELECT RangeFrom,RangeTo,a.TimeStamp as TFrom,ISNULL((Select min(b.TimeStamp) from Test b where a.UserID = b.UserID and a.TimeStamp <= b.TimeStamp and b.State = 3),RangeTo) as TToFROM Test a CROSS JOIN RangeByHalfHoursWHERE State = 4 and a.TimeStamp < RangeTo)SELECT RangeFrom,RangeTo,SUM(Datediff(s,TFrom2,TTo2))/1800.0 as CoeffFROM (Select RangeFrom,RangeTo,case when TFrom < RangeFrom then RangeFrom else TFrom end TFrom2,case when TTo > RangeTo then RangeTo else TTo end TTo2from src) as z WHERE TFrom2 < TTo2Group By RangeFrom,RangeTo;Have you got an idea?Thank you very much!Quentin |
|
|
UnemployedInOz
Yak Posting Veteran
54 Posts |
Posted - 2013-04-07 : 02:28:20
|
You could just replaceWhile @RangeFrom < @RangetoBEGININSERT INto #Range Values (@RangeFrom,DateAdd(mi,30,@RangeFrom))SELECT @RangeFrom = DateAdd(mi,30,@RangeFrom)END;withWith RangeByHalfHours As(Select @RangeFrom As RangeFrom,Dateadd(mi,30,@RangeFrom) As RangeToUnion AllSelect DateAdd(mi,30,RangeFrom),DateAdd(mi,30,RangeTo)From RangeByHalfHoursWhere RangeTo < @RangeTo)select * from RangeByHalfHoursOption ( Maxrecursion 0);I wouldnt worry about performance as I dont think trying to do it your way will add any significant improvement |
|
|
QuentinL
Starting Member
11 Posts |
Posted - 2013-04-07 : 04:03:20
|
The problem in that in the full query we don'tselect * from RangeByHalfHoursas we CROSS join the table RangeByHalfHoursso I can't find where to put the Maxrecursion option in the full query Nice to see you're here during week end Thank you |
|
|
UnemployedInOz
Yak Posting Veteran
54 Posts |
Posted - 2013-04-07 : 05:56:27
|
Create Table #TEST(UserID INT,State INT,TimeStamp DATETIME)Create Table #Range(RangeFrom datetime,RangeTo datetime)-- truncate table #Test-- truncate table #RangeiNSERT INTO #TEST VALUES (1, 3, '2013-03-27 00:00:01'),(1, 4, '2013-03-27 08:33:22'),(1, 3, '2013-03-27 12:11:26'),(1, 4, '2013-03-27 14:07:16'),(1, 3, '2013-03-27 18:00:45'),(1, 3, '2013-03-28 00:00:02'),(1, 4, '2013-03-28 08:04:55'),(1, 3, '2013-03-28 12:03:41'),(1, 4, '2013-03-28 13:23:05'),(1, 3, '2013-03-28 18:04:13'),(1, 3, '2013-03-29 00:00:02'),(1, 4, '2013-03-29 08:10:32'),(1, 3, '2013-03-29 10:10:32'),(1, 4, '2013-03-29 10:40:32'),(1, 3, '2013-03-29 11:51:03'),(1, 4, '2013-03-29 12:54:10'),(1, 3, '2013-03-29 17:47:47')Declare @RangeFrom datetime,@RangeTo datetimeSELECT @RangeFrom = '2013-03-29 08:00:00.000',@RangeTo = '2013-03-29 23:30:00.000';With RangeByHalfHours As(Select @RangeFrom As RangeFrom,Dateadd(mi,30,@RangeFrom) As RangeToUnion AllSelect DateAdd(mi,30,RangeFrom),DateAdd(mi,30,RangeTo)From RangeByHalfHoursWhere RangeTo < @RangeTo)Insert Into #Rangeselect * from RangeByHalfHoursOption ( Maxrecursion 0);-- select * from #test-- select * from #RangeWith src as (SELECT RangeFrom,RangeTo,a.TimeStamp as TFrom,ISNULL((Select min(b.TimeStamp) from #test b where a.UserID = b.UserID and a.TimeStamp <= b.TimeStamp and b.State = 3),RangeTo) as TToFROM #TEST a CROSS JOIN #RangeWHERE State = 4 and a.TimeStamp < RangeTo)SELECT RangeFrom,RangeTo,SUM(Datediff(s,TFrom2,TTo2))/3600.0 FROM (Select RangeFrom,RangeTo,case when TFrom < RangeFrom then RangeFrom else TFrom end TFrom2,case when TTo > RangeTo then RangeTo else TTo end TTo2from src) as z WHERE TFrom2 < TTo2Group By RangeFrom,RangeTo; |
|
|
QuentinL
Starting Member
11 Posts |
Posted - 2013-04-07 : 08:29:22
|
I finally succeed placing the maxrecursion option.From what I've read :-the] OPTION clause can be used only at the statement level-You can not change the default value of that option inside a udf. You will have to do it in the statement referencing the udf.So I added the option at the end of the query after the GROUP BY statement,which I'm sure I've tried yesterday but I guess I had forgot to remove the semicolon of the GROUP BY, and it works like a charm. Declare @RangeFrom datetime,@RangeTo datetimeSELECT @RangeFrom = '20130327 08:00:00',@RangeTo = '20130330 12:00:00';;WITH RangeByHalfHours AS( Select @RangeFrom As RangeFrom,Dateadd(mi,30,@RangeFrom) As RangeTo Union All Select DateAdd(mi,30,RangeFrom),DateAdd(mi,30,RangeTo) From RangeByHalfHours Where RangeFrom < @RangeTo),src AS ( SELECT RangeFrom,RangeTo,a.TimeStamp as TFrom, ISNULL((Select min(b.TimeStamp) from Test b where a.UserID = b.UserID and a.TimeStamp <= b.TimeStamp and b.State = 3),RangeTo) as TTo FROM Test a CROSS JOIN RangeByHalfHours WHERE State = 4 and a.TimeStamp < RangeTo)SELECT RangeFrom,RangeTo,SUM(Datediff(s,TFrom2,TTo2))/1800.0 as CoeffFROM (Select RangeFrom,RangeTo,case when TFrom < RangeFrom then RangeFrom else TFrom end TFrom2,case when TTo > RangeTo then RangeTo else TTo end TTo2from src) as z WHERE TFrom2 < TTo2Group By RangeFrom,RangeToOption ( Maxrecursion 0 )As expected, if I remove the Option the query fails with the error seen previously:The statement terminated. The maximum recursion 100 has been exhausted before statement completion.I think it's all for the moment thanks a lot for your help !!! Quentin |
|
|
QuentinL
Starting Member
11 Posts |
Posted - 2013-04-08 : 20:18:50
|
One more question please, I have not been able to include rows with value 0, how would you do this?Thank you |
|
|
QuentinL
Starting Member
11 Posts |
Posted - 2013-04-09 : 20:24:34
|
There it'is ;With RangeByHalfHours As ( Select @FromDate As RangeFrom,Dateadd(mi,30,@FromDate) As RangeTo Union All Select DateAdd(mi,30,RangeFrom),DateAdd(mi,30,RangeTo) From RangeByHalfHours Where cast(RangeTo as Date) <= @ToDate ),src as ( SELECT RangeFrom,RangeTo,a.TimeStamp as TFrom, ISNULL((Select min(b.TimeStamp) from Test b where a.UserID = b.UserID and a.TimeStamp <= b.TimeStamp and b.State = 3),RangeTo) as TTo FROM Test a CROSS JOIN RangeByHalfHours WHERE State = 4 and a.TimeStamp < RangeTo),smooth as ( Select RangeFrom,RangeTo, case when TFrom < RangeFrom then RangeFrom else TFrom end TFrom2, case when TTo > RangeTo then RangeTo else TTo end TTo2 from src),rslt as ( SELECT smooth.RangeFrom,smooth.RangeTo,SUM(Datediff(s,TFrom2,TTo2))/1800.0 as Coeff FROM smooth WHERE TFrom2 < TTo2 Group By smooth.RangeFrom,smooth.RangeTo)SELECT * from rsltUNION SELECT RangeFrom, RangeTo, 0 as CoeffFROM RangeByHalfHours WHERE RangeFrom NOT IN (select RangeFrom from rslt)Option ( Maxrecursion 0 ) |
|
|
UnemployedInOz
Yak Posting Veteran
54 Posts |
Posted - 2013-04-09 : 21:11:29
|
You may want to throw in an Order By RangeFromat the end... |
|
|
QuentinL
Starting Member
11 Posts |
Posted - 2013-04-10 : 04:09:46
|
Thank you, I missed the order by clause!Surprisingly the results are sorted by default.I don't know why... That probably lead me to miss the order by.Furthermore, even if not sorted I wouldn't really need the Order by as the results are used in a chart, the data will be sorted by the axis.But I would have add the order by for a better readability when using the query alone.Thank you for spotting this |
|
|
lilyyang
Starting Member
3 Posts |
Posted - 2013-07-08 : 03:43:08
|
unspammed |
|
|
|
|
|
|
|