Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-07-23 : 16:23:16
|
Edit: There seems to be some interaction between GROUP BY/NEWID() that can cause problems if these functions are used directly in an aggregate query. Please read the posts on this thread about this, and use caution.This script creates three functions, F_RANDOM_INTEGER, F_RANDOM_SAMPLE, and F_RANDOM_DATETIME. The last parameter of each function must be function NEWID() to generate the random number.Theses functions are designed for applications where it is necessary to generate random integers, random datetimes, or take random samples from sets of data. Typical applications would be software testing, inventory control, auditing, and product quality testing.Function F_RANDOM_INTEGER returns a random integer in the range of the input parameters so that the return value is >= @START_INT and <= @END_INT. It is valid for any range of two integer values.Function F_RANDOM_SAMPLE returns a 1 or a 0 to determine if a sample should be selected, based on the input sample rate. Input parameter @SAMPLE_RATE should be between 1 and 999,999. The sample rate determines how many samples should be selected out of each 1,000,000 samples. A sample rate below 1 will never select a sample, and a sample rate above 999,999 will always select a sample. A sample rate of 1,000 = 0.1%, 50,000 = 5%, 63,775 = 6.3775%, 100,000 = 10%, and 500,000 = 50%.F_RANDOM_DATETIME returns a random datetime value >= @START_TIME and < @END_TIME. It is valid for any datetime range. Input parameters default, if null, to @START_TIME = '19000101' and @END_TIME = '19000102’. The datetime is random to the level of clock ticks (1/300 of as second). Note that the latest time is not included in the range of datatime values that can be returned. This is to allow selection of times within adjacent time periods, without having to specify times to the level of milliseconds. This means a range of 1990-12-01 01:00:00.000 through 1990-12-01 02:00:00.000 will never return a value of 1990-12-01 02:00:00.000.The NEWID() function is the basis of the random numbers. These functions should not to be considered random for purposes of data encryption or other high security applications. However, they should be adequate for business applications of the types mentioned above. I conducted extensive testing with the functions where I generated millions of results, analyzed the results various ways to look for non-random patterns, and I saw no evidence of non-random results.The script also includes a demo of each function, and sample output from the demos is also included.The demo script uses the number table function, F_TABLE_NUMBER_RANGE, available on this link:http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685if objectproperty(object_id('dbo.F_RANDOM_INTEGER'),'IsScalarFunction') = 1 begin drop function dbo.F_RANDOM_INTEGER endgocreate function dbo.F_RANDOM_INTEGER ( @START_INT int, @END_INT int, @NEWID uniqueidentifier )returns intas/*Function: F_RANDOM_INTEGER This function returns a random integer value >= @START_INT and <= @END_INT. Valid for any integer range. Requires newid() to be input parameter @NEWID to generate the random number.-- Return random integer between -100, 200000select [Random Integer] = [dbo].[F_RANDOM_INTEGER](-100, 200000,newid())*/begindeclare @sn bigintdeclare @en bigintdeclare @mod bigintdeclare @rand_bigint bigintdeclare @rand_result int-- Set default values for input dates if they are nullif @START_INT is null begin set @START_INT = 0 endif @END_INT is null begin set @END_INT =1000000 end-- Set order of input parameters so that return value is always-- the same no matter what order the input values are passed.if @START_INT > @END_INT select @sn = @END_INT, @en = @START_INTelse select @sn = @START_INT, @en = @END_INT-- Return start int if start int = end intif @sn = @en return @sn-- Get modulusselect @mod = @en-@sn+1-- Get random bigint from input parameter @NEWIDselect @rand_bigint = abs(convert(bigint,convert(varbinary(20),@NEWID)))-- Get random integerselect @rand_result = @sn+(@rand_bigint%@mod)return @rand_resultendgogrant execute on dbo.F_RANDOM_INTEGER to publicgoif objectproperty(object_id('dbo.F_RANDOM_SAMPLE'),'IsScalarFunction') = 1 begin drop function dbo.F_RANDOM_SAMPLE endgocreate function dbo.F_RANDOM_SAMPLE ( @SAMPLE_RATE int, @NEWID uniqueidentifier )returns intas/*Function: F_RANDOM_SAMPLE This function returns a 1 or a 0 to determine if a sample should be selected, based on the sample rate. It is designed to select random samples at a specific rate. Input parameter @SAMPLE_RATE should be between 1 and 999,999. The sample rate determines how many samples should be selected out of each 1,000,000 samples. A sample rate below 1 will never select a sample, and a sample rate above 999,999 will always select a sample. 1,000 = 0.1%, 50,000 = 5%, 63,775 = 6.3775%, 100,000 = 10%, and 500,000 = 50% Requires newid() to be input parameter @NEWID to generate the random number.-- Select sample 200,000 times in 1,000,000 samples (20%)select [Random Sample] = [dbo].[F_RANDOM_SAMPLE](200000,newid())*/begindeclare @rand_bigint bigint-- Get random bigint from @NEWIDselect @rand_bigint = abs(convert(bigint,convert(varbinary(20),@NEWID)))-- Select sample if the modulus of @rand_bigint is less than the sample rate return case when @rand_bigint%1000000 < @SAMPLE_RATE then 1 else 0 endendgogrant execute on dbo.F_RANDOM_SAMPLE to publicgoif objectproperty(object_id('dbo.F_RANDOM_DATETIME'),'IsScalarFunction') = 1 begin drop function dbo.F_RANDOM_DATETIME endgocreate function dbo.F_RANDOM_DATETIME ( @START_TIME datetime, @END_TIME datetime, @NEWID uniqueidentifier )returns datetimeas/*Function: F_RANDOM_DATETIME This function returns a random datetime value >= @START_TIME and < @END_TIME. Valid for any datetime range. Input parameters default, if null, to: @START_TIME '19000101' @END_TIME '19000102' Requires newid() to be input parameter @NEWID to generate the random number.-- Return random time between 08:30 and 12:00select [Random Time] = [dbo].[F_RANDOM_DATETIME]('08:30:00.000','12:00:00.000',newid())*/begindeclare @st datetimedeclare @et datetimedeclare @hours intdeclare @ms intdeclare @ticks bigintdeclare @rand_ticks bigintdeclare @rand_bigint bigintdeclare @remaining_ticks intdeclare @return_hours intdeclare @return_ms int-- Set default values for input dates if they are nullif @START_TIME is null begin set @START_TIME = '19000101' endif @END_TIME is null begin set @END_TIME = '19000102' end-- Set order of input parameters so that return value is always-- the same no matter what order the input values are passed.if @START_TIME > @END_TIME select @st = @END_TIME, @et = @START_TIMEelse select @st = @START_TIME, @et = @END_TIME-- Return start time if start time = end timeif @st = @et return @st-- Get hours boundary difference.-- Subtract 1 from diff, before dividing by 2 and multiplying by 2-- so the milliseconds remaining is always positive and-- hours is always >= zero.set @hours = ((datediff(hh,@st,@et)-1)/2)*2-- Get remainder millisecondsset @ms = datediff(ms,0,@et-dateadd(hh,@hours,@st))-- Convert remainder milliseconds to-- SQL Server 'clock ticks' of 1/300 of a secondset @ticks = ((@ms/10)*3) + ((@ms%10)/3)-- Add hours * tick per hour (3600*300) to give total-- ticks between @START_TIME and @END_TIMEset @ticks = @ticks + (@hours * 0000001080000 )-- Get random bigint from input parameter @NEWIDselect @rand_bigint = abs(convert(bigint,convert(varbinary(20),@NEWID)))-- Get random number of ticksselect @rand_ticks = @rand_bigint%@ticks-- Get hours component of random ticksselect @return_hours = @rand_ticks/1080000-- Get left over ticks after removing hours.select @remaining_ticks = @rand_ticks%1080000--Convert remaining clock ticks back to millisecondsselect @return_ms = ((@remaining_ticks/3)*10) + floor(((@remaining_ticks%3)*3.5))-- Return the random time between the start and end timereturn dateadd(ms,@return_ms,dateadd(hh,@return_hours,@st))endgogrant execute on dbo.F_RANDOM_DATETIME to publicgoprint '-----------------------------------------------------------------'print ' Demo F_RANDOM_INTEGER function'print '-----------------------------------------------------------------'print ''declare @t table ([Random Integer] int not null )insert into @tselect -- Get integert in range of 1 to 10,000,000 [Random Integer] = [dbo].[F_RANDOM_INTEGER](1,10000000,newid() )from -- Function F_TABLE_NUMBER_RANGE -- available in Script Library forum F_TABLE_NUMBER_RANGE(1,100000)select [Right Int] = [Random Integer]%10, [Count] = count(*)from @t agroup by [Random Integer]%10order by 1,2select [Million Range] = [Random Integer]/1000000, [Count] = count(*)from @t agroup by [Random Integer]/1000000order by 1,2goprint '-----------------------------------------------------------------'print ' Demo F_RANDOM_SAMPLE function'print '-----------------------------------------------------------------'print ''declare @t table ([Sample Taken] int not null )insert into @tselect -- Sample rate = 6.3775% [Sample Taken] = [dbo].[F_RANDOM_SAMPLE](63775,newid())from -- Function F_TABLE_NUMBER_RANGE -- available in Script Library forum F_TABLE_NUMBER_RANGE(1,100000)select [Sample Taken], [Result Count] = count(*)from @t agroup by [Sample Taken]order by 1,2goprint '-----------------------------------------------------------------'print ' Demo F_RANDOM_DATETIME function'print '-----------------------------------------------------------------'print ''select Random_Datetime = convert(varchar(23),[dbo].[F_RANDOM_DATETIME]( a.ST, a.ET,newid() ) ,121) , [Start] = convert(varchar(23),a.ST ,121) , [End] = convert(varchar(23),a.ET ,121) , a.Commentfrom(select ST = getdate(), ET = getdate()+2 , Comment = 'Now thru 2 days from now' union allselect '20060101', '20060102' , 'One day diff' union allselect '20030101', '20030101' ,'Both times same' union allselect '20030101', '20030108' ,'One week diff' union allselect '20021228', '20030104' ,'One week diff' union allselect '20010701', '20010713' ,'12 day diff' union allselect '20010701', '20010714' ,'13 day diff' union allselect '20010630', '20010713' ,'13 day diff' union allselect '19901201 01:00:00.000', '19901201 02:00:00.000' ,'1 hour diff' union allselect '19901201 01:00:33.003', '19901201 02:00:33.003' ,'1 hour diff' union allselect '19901201 01:00:00.000', '19901201 01:30:00.000' ,'30 min diff' union allselect '19901201 01:00:33.447', '19901201 01:30:33.447' ,'30 min diff' union allselect '19901201 01:00:00.000', '19901201 01:05:00.000' ,'5 min diff' union allselect '19901201 01:00:29.123', '19901201 01:05:29.123' ,'5 min diff' union allselect '19901201 01:00:00.000', '19901201 01:01:00.000' ,'1 min diff' union allselect '19901201 01:00:00.000', '19901201 01:00:01.000' ,'1 sec diff' union allselect '19901201 01:00:00.000', '19901201 01:00:00.100' ,'100 ms diff' union allselect '19901201 01:00:00.000', '19901201 01:00:00.050' ,'50 ms diff' union allselect '19901201 01:00:00.000', '19901201 01:00:00.023' ,'23 ms diff' union allselect '19901201 01:00:00.000', '19901201 01:00:00.020' ,'20 ms diff' union allselect '19901201 01:00:00.000', '19901201 01:00:00.013' ,'13 ms diff' union allselect '19901201 01:00:00.000', '19901201 01:00:00.010' ,'10 ms diff' union allselect '19901201 01:00:00.000', '19901201 01:00:00.007' ,'7 ms diff' union allselect '19901201 01:00:00.000', '19901201 01:00:00.003' ,'3 ms diff' union allselect '20030101', '20030201' ,'One month diff 31 days' union allselect '20030101', '20040101' ,'One year diff' union allselect '20050101', '20070101' ,'Two year diff' union allselect '20060101', '20060301' ,'2 month diff' union allselect null, '20060101' ,'Start time null' union allselect '20060102', null ,'End time null' union allselect null, null ,'Both null' union allselect '17530101', '99991231 23:59:59.997' ,'Max datetime diff' union allselect '99991231 23:59:59.997','17530101' ,'Max datetime diff reversed') a Demo Results:----------------------------------------------------------------- Demo F_RANDOM_INTEGER function----------------------------------------------------------------- (100000 row(s) affected)Right Int Count ----------- ----------- 0 99291 100552 100093 100824 99195 100226 99147 99858 100989 9987(10 row(s) affected)Million Range Count ------------- ----------- 0 100091 99852 101423 100474 99675 99076 100787 100718 97909 10004(10 row(s) affected)----------------------------------------------------------------- Demo F_RANDOM_SAMPLE function----------------------------------------------------------------- (100000 row(s) affected)Sample Taken Result Count ------------ ------------ 0 936691 6331(2 row(s) affected)----------------------------------------------------------------- Demo F_RANDOM_DATETIME function----------------------------------------------------------------- Random_Datetime Start End Comment ----------------------- ----------------------- ----------------------- -------------------------- 2006-07-24 08:21:22.593 2006-07-23 17:54:47.283 2006-07-25 17:54:47.283 Now thru 2 days from now2006-01-01 06:44:36.897 2006-01-01 00:00:00.000 2006-01-02 00:00:00.000 One day diff2003-01-01 00:00:00.000 2003-01-01 00:00:00.000 2003-01-01 00:00:00.000 Both times same2003-01-05 01:57:02.183 2003-01-01 00:00:00.000 2003-01-08 00:00:00.000 One week diff2003-01-01 20:02:05.550 2002-12-28 00:00:00.000 2003-01-04 00:00:00.000 One week diff2001-07-02 11:35:11.147 2001-07-01 00:00:00.000 2001-07-13 00:00:00.000 12 day diff2001-07-02 16:39:57.433 2001-07-01 00:00:00.000 2001-07-14 00:00:00.000 13 day diff2001-07-06 12:33:53.087 2001-06-30 00:00:00.000 2001-07-13 00:00:00.000 13 day diff1990-12-01 01:15:42.530 1990-12-01 01:00:00.000 1990-12-01 02:00:00.000 1 hour diff1990-12-01 01:02:21.647 1990-12-01 01:00:33.003 1990-12-01 02:00:33.003 1 hour diff1990-12-01 01:21:06.267 1990-12-01 01:00:00.000 1990-12-01 01:30:00.000 30 min diff1990-12-01 01:26:17.983 1990-12-01 01:00:33.447 1990-12-01 01:30:33.447 30 min diff1990-12-01 01:00:56.327 1990-12-01 01:00:00.000 1990-12-01 01:05:00.000 5 min diff1990-12-01 01:03:20.423 1990-12-01 01:00:29.123 1990-12-01 01:05:29.123 5 min diff1990-12-01 01:00:21.617 1990-12-01 01:00:00.000 1990-12-01 01:01:00.000 1 min diff1990-12-01 01:00:00.443 1990-12-01 01:00:00.000 1990-12-01 01:00:01.000 1 sec diff1990-12-01 01:00:00.050 1990-12-01 01:00:00.000 1990-12-01 01:00:00.100 100 ms diff1990-12-01 01:00:00.000 1990-12-01 01:00:00.000 1990-12-01 01:00:00.050 50 ms diff1990-12-01 01:00:00.010 1990-12-01 01:00:00.000 1990-12-01 01:00:00.023 23 ms diff1990-12-01 01:00:00.017 1990-12-01 01:00:00.000 1990-12-01 01:00:00.020 20 ms diff1990-12-01 01:00:00.007 1990-12-01 01:00:00.000 1990-12-01 01:00:00.013 13 ms diff1990-12-01 01:00:00.000 1990-12-01 01:00:00.000 1990-12-01 01:00:00.010 10 ms diff1990-12-01 01:00:00.003 1990-12-01 01:00:00.000 1990-12-01 01:00:00.007 7 ms diff1990-12-01 01:00:00.000 1990-12-01 01:00:00.000 1990-12-01 01:00:00.003 3 ms diff2003-01-14 09:00:09.520 2003-01-01 00:00:00.000 2003-02-01 00:00:00.000 One month diff 31 days2003-08-27 11:47:04.100 2003-01-01 00:00:00.000 2004-01-01 00:00:00.000 One year diff2006-11-23 03:57:21.737 2005-01-01 00:00:00.000 2007-01-01 00:00:00.000 Two year diff2006-01-12 08:50:40.717 2006-01-01 00:00:00.000 2006-03-01 00:00:00.000 2 month diff1933-11-15 13:39:10.050 NULL 2006-01-01 00:00:00.000 Start time null1997-05-28 06:42:32.407 2006-01-02 00:00:00.000 NULL End time null1900-01-01 01:50:42.743 NULL NULL Both null2758-10-18 13:50:47.987 1753-01-01 00:00:00.000 9999-12-31 23:59:59.997 Max datetime diff8426-03-24 13:51:08.407 9999-12-31 23:59:59.997 1753-01-01 00:00:00.000 Max datetime diff reversed(33 row(s) affected) CODO ERGO SUM |
|
Kristen
Test
22859 Posts |
Posted - 2006-07-23 : 16:38:44
|
Nice job MVJ |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2006-07-23 : 16:41:32
|
I would be very cautious about using that (in fact I wouldn't).Functions are expected to be deterministic and can produce incorrect results if you manage to fool the server.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
Kristen
Test
22859 Posts |
Posted - 2006-07-23 : 16:56:41
|
"I would be very cautious about using that"Do you mean the workaround in order to use NewID() from a function?Kristen |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-07-23 : 16:57:22
|
quote: Originally posted by nr I would be very cautious about using that (in fact I wouldn't).Functions are expected to be deterministic and can produce incorrect results if you manage to fool the server.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy.
I have to disagree that “Functions are expected to be deterministic”. Any function that calls an extended stored procedure is non-deterministic, and this is specifically allowed in the documentation in BOL.However, if you are really nervous about getting the NEWID() from the view, you could modify the functions to allow the NEWID() to be passed as an input parameter.CODO ERGO SUM |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2006-07-23 : 17:19:10
|
>> I have to disagree that “Functions are expected to be deterministic”. Any function that calls an extended stored procedure is non-deterministic, and this is specifically allowed in the documentation in BOL.Ok - maybe I should have said it won't be deterministic.Odd - if you pass in the newid() it takes 1 sec to process my sample but 20 secs if call the view to get it.That's runningselect d, count(*) c from (select d = dbo.F_RANDOM_DATETIME2('1 jan 2006','1 jan 2006 00:00:01', newid())from sysobjects, sysobjects b, sysobjects c) agroup by dorder by dselect d, count(*) c from (select d = dbo.F_RANDOM_DATETIME('1 jan 2006','1 jan 2006 00:00:01')from sysobjects, sysobjects b, sysobjects c) agroup by dorder by dEven passing in the newid() some duplicate dates are produced but with the view very few of the duplicates are removed (on my system)==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-07-23 : 19:30:57
|
>>> "Even passing in the newid() some duplicate dates are produced but with the view very few of the duplicates are removed (on my system)"I think you are correct that there is some problem with using the view; it seems to foul up the group by operation. I removed the view and added @NEWID as a parameter to each function and reposted the code.However, I think there is still some unexplained interaction with the group by and NEWID() that is causing a problem. Notice how in the following code, the TOP expression causes the problem to go away. I would almost call this a bug in SQL server. Or at least an undocumented feature.Edit: Could someone try the code below in SQL 2005 to see if it productes the same results?drop table #tempgoselect numberinto #tempfrom F_TABLE_NUMBER_RANGE(1,10000) agoprint 'No duplicates with top'select nm = rnd%10from ( select top 100 percent rnd=abs(convert(bigint,convert(varbinary(20),newid() ))) from #temp ) agroup by rnd%10order by rnd%10goprint 'Gives duplicates'select nm = rnd%10from ( select rnd=abs(convert(bigint,convert(varbinary(20),newid() ))) from #temp ) agroup by rnd%10order by rnd%10 Results:(10000 row(s) affected)No duplicates with topnm -------------------- 0123456789(10 row(s) affected)Gives duplicatesnm -------------------- 1115567889(10 row(s) affected) CODO ERGO SUM |
|
|
|
|
|
|
|