Custom Auto-Generated Sequences with SQL ServerBy Jeff Smith on 24 April 2007 | Tags: Table Design , Identity This article by Jeff Smith covers different ways to create sequence numbers. It starts with the basic identity and GUIDs and quickly moves to much more complicated sequence numbers including those with auto-incrementing mixed letters and numbers such as "A0001", etc. Jeff shows one of the easiest ways I've ever seen to efficiently handle very odd sequence numbers. IntroductionSometimes you'd like the SQL Server itself to automatically generate a sequence for entities in your table as they are created. For example, assigning each new Customer added to your table a unique "CustomerNumber". There are effectively two ways to do using the built-in features that T-SQL provides:
alter table YourTable add ID int identity(start, increment)
alter table YourTable add ID uniqueidentifier default newid() However, we often see questions in the forums regarding how to create other types of auto-generated sequences in tables. For example, you might want your customers to automatically be assigned "Customer Numbers" as formatted like this: C0001 C0002 ... C9998 C9999 In other cases, people would like to use incrementing letters instead of numbers, some combination of both, or for the digits to be "reset" on some specific condition, and so on. The most important and crucial part of implementing this is not writing the code! It is clearly defining your specification and ensuring that it is logical and works for you. Before you can write code that will automatically generate sequences for you, you must consider:
So, the very first step is to clearly, accurately, and completely define how your sequence values will be generated. You must explicitly map out how to handle all possible situations and you must do some research to ensure that your specification will work for the data you are handling. A primary key of "A0" through "Z9" will only work for 26*10 = 260 values -- is this really what you want? There are a few different approaches you can take in order to facilitate this behavior in SQL Server, so let's take a look. Option 1: Determine the next value by querying a tableThis first approach is usually the most common, and in my opinion also the worst. What people try to do here is to query the existing table to retrieve the last sequence value created, and then use that value to create the next one. For example, you might create a User-Defined Function that you could use in a trigger like this: create function NextCustomerNumber() returns char(5) as begin declare @lastval char(5) set @lastval = (select max(customerNumber) from Customers) if @lastval is null set @lastval = 'C0001' declare @i int set @i = right(@lastval,4) + 1 return 'C' + right('000' + convert(varchar(10),@i),4) end This can cause some issues, however:
Overall, if this approach is absolutely required, then it's what you've got to do, but be sure that you consider the next two options first, which are much easier to implement and will generally work in most cases. Option 2: Make it a presentation issueThe most common rules seem to be the simple ones, such as previous example ("C0000" to "C9999"), or something similar. Looking at this closely, we see that it is really just simply a number from 1-x, formatted with leading zeroes to be 4 digits, and then prefixed with a "C". Which means that all the database needs to do is generate a number from 1-x, which means ... why not just use an identity and let your front-end format the sequence value? Do you really need to store it in the database as a VARCHAR? What does this gain? If you simply use an identity and return an integer to the front-end, it is trivial to format it with a "C" in the front and with the necessary leading zeroes. By doing this, you have all of the advantages of a built-in SQL Server generated identity value with none of the headaches -- no worries about concurrency, performance, set-based triggers, and so on. So, really carefully think to yourself: "Do I really need to format a simple integer in T-SQL and store that formatted value in my tables? Or can I simply use that integer internally throughout the database and format it any way I want at my presentation layer?" The advantage of this approach is simplicity, but the disadvantage is that you must rely on your front-end applications and reports to understand how to format your codes. Option 3: Let an Identity be your guideIf you really need to store the actual, auto-generated sequence in your tables, or if a simple incrementing integer formatted a certain way isn't enough, there is another simple option: Use a regular identity column internally as your table's primary key, but then use some math to calculate your external "auto-generated sequence" in another column using the identity value generated. This column can be stored in your table via a trigger, added as a computed column, or calculated using a View. You can implement it any way you wish. The advantage of this approach is that we are using the database's built-in feature that guarantees that no two identities in a table will be alike, and that every row will get one; therefore, if we map each integer to a unique value in our sequence, we are guaranteed that all of our sequence values will also be unique. And this will work for set-based operations as well. Let's start with the simple "C0000" - "C9999" example. First, let's create our Customers table like this: create table Customers ( dbID int identity not null primary key, CustomerName varchar(100) ) Note that the dbID column is standard, database-generated identity which will be our physical primary key of the table. However, we will add a CustomerNumber column which will be what we expose to the outside world in the "C0000" format, as described. Let's create a function accepts an integer, and uses that integer to return our CustomerNumber: create function CustomerNumber (@id int) returns char(5) as begin return 'C' + right('0000' + convert(varchar(10), @id), 4) end Using that function, we can simply add a computed column to our table like this: alter table Customers add CustomerNumber as dbo.CustomerNumber(dbID) Or, we could also create a column in our table to store the Customer Number, and use a trigger to populate it: alter Customers add CustomerNumber varchar(10) create trigger Customers_insert on Customers after insert as update Customers set Customers.customerNumber = dbo.CustomerNumber(Customers.dbID) from Customers inner join inserted on Customers.dbID= inserted.dbID Using either method, once they are in place, we can simply insert into our table, and for each Row added a unique "Customer Number" is assigned: insert into Customers (CustomerName) values ('jeff') select * from Customers returns: (1 row(s) affected) dbID CustomerName CustomerNumber ----------- ------------ -------------- 1 jeff C0001 (1 row(s) affected) The key to making this work is the formula that maps a 1:1 relation between integers and your sequence values. That example was very trivial to implement; let's try some more complicated ones. For example, suppose that the CustomerNumber will be in this format: AAAA AAAB ... AAAZ AABA ... ZZZZ How can we take an integer and map it to this? In this case, it is the same way you convert digits from decimal to hexadecimal or any other "base" -- we are converting from a base 10 (decimal) to base 26 (A-Z). Since we are working with a 4 "digit" value in base 26, this gives us 26 to the power of 4 different possibilities -- 456,976 different Customer Numbers can generated. (Remember the most important step before writing any code: is this acceptable?) A simple algorithm to convert our integers to this base 26 number could be like this: Starting with an integer identity value x:
To convert a number from 0-25 to a letter from A-Z, we add 65 to it and use the CHAR() function, since the ASCII value of "A" is 65 and the rest of the letters all follow in sequence (i.e., "B" is 66, "C" is 67, etc). Thus, to implement this method, our CustomerNumber function becomes: create function CustomerNumber (@id int) returns char(5) as begin return char(@id / power(26,3) % 26 + 65) + char(@id / power(26,2) % 26 + 65) + char(@id / 26 % 26 + 65) + char(@id % 26 + 65) end Equally as important as writing the algorithm out on paper is then testing your implementation. We can do some simple testing like this to help us ensure that our function is working: select x, dbo.CustomerNumber2(x) from ( select 1 as x union all select 25 union all select 26 union all select 27 union all select 51 union all select 52 ) x which returns: x ----------- ----- 1 AAAB 25 AAAZ 26 AABA 27 AABB 51 AABZ 52 AACA (6 row(s) affected) And that appears to do what we need. You should of course test the higher values as well. The key is that we have mapped our integer values using our function to Customer Numbers from AAAA-ZZZZ, all guaranteed to be unique, and we don't need to worry about set-based inserts or lookups on our existing table. Let's try one more. How about AA000 AA001 ... AA999 AB000 AB001 ... AZ999 BA000 BA001 ... ZZ999 In this one, the left 2 digits are base 26; the right 3 are base 10. This gives us 26*26*10*10*10 = 676,000 possible values. We use the same basic algorithm, starting from an integer x and working right to left:
Let's put this logic into a User Defined Function, which makes testing easier: create function CustomerNumber(@i int) returns char(5) as begin return (char(@i / 26000 % 26 + 65) + char(@i / 1000 % 26 + 65) + char(@i / 100 % 10 + 48) + char(@i / 10 % 10 + 48) + char(@i % 10 + 48)) end And let's test this function to ensure that it works: select x, dbo.customerNumber(x) from ( select 9 as x union all select 99 union all select 100 union all select 999 union all select 1000 union all select 25999 union all select 26000 union all select 51999 union all select 52000 ) x x ----------- ----- 9 AA009 99 AA099 100 AA100 999 AA999 1000 AB000 25999 AZ999 26000 BA000 51999 BZ999 52000 CA000 (9 row(s) affected) And, again, that is just a guideline, but you should do as much testing as you can before implementing any of these features to ensure that they accurately do what you need. The key is to identify and test the "boundary" numbers where the values need to reset or change in a more complicated manner than simply incrementing a single digit. If you need to add dashes, or a constant prefix or suffix or anything else, you can easily do it as well, all in your UDF. The key is to make your logic entirely dependant on an integer value, and to map each integer value to a unique value in your designated sequence. By doing this, you are letting SQL Server do the hard part -- ensure that your keys are consistent and unique -- but now you have the flexibility of creating your "CustomerNumbers" or other values in the exact format that you choose. SummarySo, if you do find that you need to custom sequences of some sort at the database layer, here's my recommendation:
It can sometimes be tricky to come up with an algorithm and/or write the code to convert integers into your format, but if you can do it, I have found that in general it is the best approach to take.
|
- Advertisement - |