Using NEWID to Randomly Sort RecordsBy Bill Graziano on 16 April 2002 | Tags: Randomness We've got a quite a few articles on randomly selecting and ordering rows. I'm still amazed how popular that topic is. I recently stumbled across a post in the forums that led me to a better way to accomplish this. This has actually been posted a few times before. I'm just finally catching up.
There was a post in the forums that led me to Randomly Sorting Query Results in the Tips and Tricks section of Microsoft's SQL Server web site. They said that
SELECT * FROM Northwind..Orders ORDER BY NEWID() would randomly sort the records in the table. This would be much better than our current solution which uses a temp table AND a cursor. TestingMy first goal was to test this approach and see how random it really was. I wrote a simple T-SQL script to build a temp table and select the first record 100,000 times. You can download the script if you'd like. It loops around this piece of code: select @found = ID from @results order by NEWID() My script also summarzes the results and this is what it returned: ID HitCount ----------- ----------- 1 9905 2 9970 3 10042 4 10028 5 9988 6 9944 7 10011 8 10084 9 9991 10 10037 Std Dev ------------------------ 52.05 This says it selected the first record 9905 times, the second record 9970 times, etc. Unfortunately I'm not a great statistician. I'm not even a poor statistician. I've run this code again and again and again and the results seem pretty random to me. The result you see above is pretty typical. If someone out there can give me a statistical model for randomness I'll certainly test this against it. That only tested the whether the first item in the list was random. I also needed to know if it sorted them in random order. So I wrote another script to track the position of each row. I limited by table to five rows this time. The entire result set is too large to print out but the results for the 4th row are pretty typical: ID Position HitCount ----------- ----------- ----------- 4 1 413 4 2 424 4 3 395 4 4 371 4 5 397 Again if you want more detail you can download the script and run them for yourselves. They only use temp tables and table variables so they won't create any objects in your database. This test also seemed like it generated random results. Rewriting the ArticlesRewriting the query in our original article, Returning Rows in Random Order, gives us this:SELECT ID FROM FOO ORDER BY NEWID() which is much better than the cursor over a temp table in the original article. I also wrote Returning a Single Random Row. Using our new method you could select a single random row like this: SELECT TOP 1 ID FROM FOO ORDER BY NEWID() This is much simpler than the approach in the earlier article which built a sequential number and then picked the record at random. The You'll also find a comment in this thread that indicates this is only possible on Windows 2000. I was able to test SQL7 on NT4 and confirmed that NEWID generated a non-random (i.e. sequential) result. I tested SQL7 on Windows 2000 and it generated random results so that post seems to be accurate. In summary, this is a better way to randomly order rows in SQL Server 2000. It has some performance concerns but it's very easy to implement. And if you're especially curious you can find out a little about the history of random number generation.
|
- Advertisement - |