Monday, February 20, 2012

looking for equivalent to alphanumeric identity column

hello everyone,
i'm looking for a solution that would generate the equivalent of a
case-inspecific, alphanumeric identity column.
which is to say a column that iterates through guaranteed unique
values, but using case-inspecific alphanumeric characters instead of
just an integer value. or, another way to think of it would be a base
36 number, perhaps.
i have written a method that will turn turn a 32 bit integer value into
the appropriate 6-character string, shaving off some of the higher
order bits which are excess. so i was thinking i could just keep a
table that stores the "last used integer" and when a new alphanumeric
identity is needed, get the last used integer value, increment it,
convert it to a string for use as the alphanumeric, then update the
table with the new integer value.
would that be transactionally sound, though? for example, could someone
come in after another person read the value, but before the the table
was updated with the incremented value, and end up with duplicate
alphanumerics?
any other solutions also welcome.
NOTE: one possibly complicating factor is that while most of the
alphanumeric strings can just be the alphanumeric version of the
integer, one of the strings actually has to be a bit-scrambled version
of the integer. i've already written the routine to do the
bit-scrambling as well, but that's something to consider when making
suggestions. if there's a better way to a random or semi-random looking
sequence of strings out of what is originally an incrementing integer
value, feel free to suggest that as well!
thanks for any help,
jason>> if there's a better way to a random or semi-random looking sequence of st
rings out of what is originally an incrementing integer value, feel free to
suggest that as well! <<
Here is an implementation of the additive congruential method of
generating values in pseudo-random order and is due to Roy Hann of
Rational Commerce Limited, a CA-Ingres consulting firm. It is based on
a shift-register and an XOR-gate, and it has its origins in
cryptography. While there are other ways to do this, this code is nice
because:
1) The algorithm can be written in C or another low level language for
speed. But math is fairly simple even in base ten.
2) The algorithm tends to generate successive values that are (usually)
"far apart", which is handy for improving the performance of tree
indexes. You will tend to put data on separate physical data pages in
storage.
3) The algorithm does not cycle until it has generated every possible
value, so we don't have to worry about duplicates. Just count how many
calls have been made to the generator.
4) The algorithm produces uniformly distributed values, which is a nice
mathematical property to have. It also does not include zero.
Generalizing the algorithm to arbitrary binary word sizes, and
therefore longer number sequences, is not as easy as you might think.
Finding the "tap" positions where bits are extracted for feedback
varies according to the word-size in an extremely non-obvious way.
Choosing incorrect tap positions results in an incomplete and usually
very short cycle, which is unusable. If you want the details and tap
positions for words of one to 100 bits, see E. J. Watson, "Primitive
Polynomials (Mod 2)", Mathematics of Computation, v.16, 1962,
p.368-369. Here is code for a 31-bit integer, which you can use:
UPDATE Generator31
SET keyval =
keyval/2 + MOD(MOD(keyval, 2) + MOD(keyval/8, 2), 2)*2^30;
Or if you prefer, the algorithm in C:
int Generator31 ()
{static int n = 1;
n = n >> 1 | ((n^n >> 3) & 1) << 30;
return n;
}|||wow, this looks perfect. much better than the NOT and bit array
shuffling approach i have been tinkering with so far. thanks so much
for all the information!
a question about the transactional timing of the SQL version of this
algorithm: what is the most reliable way to get the value of keyval in
such a way as to ensure that the procedure calling the update statement
is the only procedure that receives the corresponding keyval value.
sort of like a classic semaphore, read-and-set? will putting the update
and select statements into a transaction accomplish this singularity of
selected keyval values per update?
thanks again,
jason|||hmm, having trouble finding the E. J. Watson article you mention at my
local college. the only reference i've found to it is in a mathsci
journal database online, which costs a few thousand per year to
subscribe to :)
i assume this article would give me the key "tap" points with which to
modify the algorithm to apply to bit patterns of different size (such
as 32 instead of 31)?|||1) Do updates and the audit stuff with a SERIALIZABLE isolation level
for the procedure. And look at tabel lockign options.
2) Look up "Roy Hann" and "Ingres" for an article on this. You do not
want to do 32 bits -- it is messy and causes an overflaow on a 32 bit
machine. We are talking about over 2 billion numbers already; are you
McDonald's?|||thanks for the feedback.
re: 2) hehe, well, i was hoping for a 6 character alphanumeric string.
that's inherently 36^6 possibilities, which is just a tiny bit more
than what can fit in a 31 bit pattern. though i suppose i wouldn't have
to drop a whole character from the string. i could simply stick to
whatever 6 string values are available in the 31 bit pattern.
ideally, i could find a way to incorporate the restriction i'm imposing
on the possible string values, which is no more than 2 consecutive
characters (to avoid accidental curse words and such). that
significantly reduces the number of combinations to well within the 31
bit range, but i'm not sure how i could collapse that reduced range
into the bit pattern. it seems like i could only iterate through the
bit patterns with this algorithm, test if it is a "valid" bit pattern
with regard to the character restriction, and if so, use it, if not,
iterate again.
to to somehow iterate through the bit pattern, and have every number
correspond to a known valid string, would involve coming up with some
kind of lookup-table where the valid values have been collapsed, and
are available by some kind of iterable index? that seems like a lot of
work, and all it would do is: (1) let me reduce the size of the bit
pattern from 31 to 30 bits, and (2) allow me to get ALL of the possible
values in the range, instead of those within the 31 bit pattern range.
i doubt it's worth it.|||just found Roy Hann's article called "Key Points about Surrogate Keys"
... holy crap, i love this man. this article is awesome. thanks so much
for suggesting it, it is EXACTLY what i need. covers the algorithmic
and database performance concerns all at once!
jason|||


Create function dbo.IncrementingAlphaNumericKey (@.priorAphaNumKey varchar(10))



Returns varchar(10)



/*



select dbo.IncrementingAlphaNumericKey ('BT00000ZZZ')



*/



BEGIN --function



declare @.singleChar char(1),



@.fieldLength int,



@.nextAphaNumKey varchar(10)



set @.fieldLength = Len(@.priorAphaNumKey)



-- select priorAphaNumKey = @.priorAphaNumKey,



-- fieldLength = Cast(@.fieldLength as varchar(4)),



-- lastChar = substring(@.priorAphaNumKey, @.fieldLength, 1),



-- stuff(@.priorAphaNumKey, @.fieldLength, 1,'A')



 



While @.fieldLength > 0



Begin



set @.singleChar = Substring(@.priorAphaNumKey, @.fieldLength, 1)



If @.singleChar = 'Z'



Begin



set @.nextAphaNumKey = Stuff(@.priorAphaNumKey, @.fieldLength, 1,'A')



--Print 'PriorAphaNumKey = ' + @.priorAphaNumKey + ', nextAphaNumKey = ' + @.nextAphaNumKey



End



Else



Begin



set @.singleChar = Char(ASCII(@.singleChar) + 1)



set @.nextAphaNumKey = Stuff(@.priorAphaNumKey, @.fieldLength, 1,@.singleChar)



--Print 'priorAphaNumKey = ' + @.priorAphaNumKey + ' , nextAphaNumKey = ' + @.nextAphaNumKey



break



End



set @.fieldLength = @.fieldLength - 1



set @.priorAphaNumKey = @.nextAphaNumKey --returning to the



End --While loop



 -- Print @.nextAphaNumKey



Return @.nextAphaNumKey



END --function

No comments:

Post a Comment