Multiple Random Records w/Single Select

By Bill Graziano on 4 October 2000 | Tags: Randomness

Reverend writes "This has got me. I am looking for a way to create a record set with multiple random values using a single, non-iterated/non-looped, select statement. On a 265 record DB I even tried . . .

The full question is "This has got me. I am looking for a way to create a record set with multiple random values using a single, non-iterated/non-looped, select statement. On a 265 record DB I even tried:

SELECT DISTINCT UnitID FROM tblG4 where UnitID = (round((rand() * 264), 0) + 1) or UnitID = (round((rand() * 264), 0) + 1)

This looked like a fix in scenarios where only two values were needed until for whatever reason, until several executions returned a single value. About 1/300.

I am also looking into an option with CURSORS and FETCHING the data, but I hate the concept.

The table is set up so UnitID is a unique integer identifier from 1 to 265. Looking to pull 2, 5, 7, and 14 distinct random values into a recordset. Thanx in advance for any insight you can provide.

Rev. BK"

Note: The approach in this article has been replaced by Using NEWID to Randomly Sort Records if you're using Windows 2000 or higher.

I've been getting quite a lot of random record questions. I never knew this topic would be so popular. I would start by reading the articles on Sequentially numbering records and Returning a single random row.

In your current solution you mention that you are getting a single record about once every 300 attempts. That makes sense. What happens is your two random numbers are returning the same value at that point. It should happen once every 264 times if you had a large enough sample. I'm actually suprised that your solution works at all. Multiple RAND functions in a single batch statement often return identical values all the time (article).

If you need multiple random records you can use an IN clause. Please see the second query of this article. What you might think about doing is dynamically building your IN clause and the SQL SELECT statement. We have articles on building a CSV and dynamic queries (and here). This doesn't solve the problem of selecting the same record twice. You can add code to this solution to check for duplicates but it gets kind of tricky. That should get a single non-looped, non-iterated SELECT statement. However, you'll need to code a loop to generate unique random numbers.

I don't know how firm your commitment to a single select statement is but I'd consider using a temp table and putting records into it. This will let SQL Server deal with the uniqueness aspect. Your temp table will be in RAM and will be pretty darn fast. Think of a temp table as a giant variable that you can run SQL statements against. Transact-SQL doesn't have arrays, it has temporary tables.

You can select a single random record into the temp table using the following code (which should look familiar):

INSERT #RandomRows
SELECT DISTINCT UnitID 
FROM tblG4 
WHERE UnitID = (round((rand() * 264), 0) + 1)

Your second select statement needs to select only records that aren't already in the temp table. You can do that like this:

INSERT #RandomRows
SELECT DISTINCT UnitID 
FROM tblG4 
WHERE UnitID = (round((rand() * 264), 0) + 1)
AND UnitID NOT IN (SELECT UnitID FROM #RandomRows)

After this statement runs you can query the @@ROWCOUNT variable to see how many rows where inserted. It should be either zero or one. I'd suggest building a loop based on how many records you need and adding @@ROWCOUNT to your counter each time. If it picked a record that was already in the table, @@ROWCOUNT will return 0 and your counter won't be incremented. That will fill the table with the correct number of unique random records. You can package the whole thing up in a stored procedure and pass it the number of random records to return. Good luck on this one, it sounds interesting.