Ubuntu Manpage: Session::Token - Secure, efficient, simple random session token generation

Provided by: libsession-token-perl_1.502-1build1_amd64

NAME

       Session::Token - Secure, efficient, simple random session token generation

SYNOPSIS

   Simple 128-bit session token
           my $token = Session::Token->new->get;
           ## 74da9DABOqgoipxqQDdygw

   Keep generator around
           my $generator = Session::Token->new;

           my $token = $generator->get;
           ## bu4EXqWt5nEeDjTAZcbTKY

           my $token2 = $generator->get;
           ## 4Vez56Zc7el5Ggx4PoXCNL

   Custom minimum entropy in bits
           my $token = Session::Token->new(entropy => 256)->get;
           ## WdLiluxxZVkPUHsoqnfcQ1YpARuj9Z7or3COA4HNNAv

   Custom alphabet and length
           my $token = Session::Token->new(alphabet => 'ACGT', length => 100_000_000)->get;
           ## AGTACTTAGCAATCAGCTGGTTCATGGTTGCCCCCATAG...

DESCRIPTION

This module provides a secure, efficient, and simple interface for creating session
tokens, password reset codes, temporary passwords, random identifiers, and anything else
you can think of.

When a Session::Token object is created, 1024 bytes are read from "/dev/urandom" (Linux,
Solaris, most BSDs), "/dev/arandom" (some older BSDs), or
Crypt::Random::Source::Strong::Win32 (Windows). These bytes are used to seed the ISAAC-32
<http://www.burtleburtle.net/bob/rand/isaacafa.html> pseudo random number generator.

Once a generator is created, you can repeatedly call the "get" method on the generator
object and it will return a new token each time.

IMPORTANT: If your application calls "fork", make sure that any generators are re-created
in one of the processes after the fork since forking will duplicate the generator state
and both parent and child processes will go on to produce identical tokens (just like
perl's rand after it is seeded).

After the generator context is created, no system calls are used to generate tokens. This
is one way that Session::Token helps with efficiency. However, this is only important for
certain use cases (generally not web sessions).

ISAAC is a cryptographically secure PRNG that improves on the well-known RC4 algorithm in
some important areas. For instance, it doesn't have short cycles or initial bias like RC4
does. A theoretical shortest possible cycle in ISAAC is "2**40", although no cycles this
short have ever been found (and probably don't exist at all). On average, ISAAC cycles are
"2**8295".

Creators of server applications must choose whether a single generator will be kept around
and used to generate all tokens, or if a new Session::Token object will be created every
time a token is needed.

Generally speaking the generator should be kept around and re-used. Probably the most
important reason for this is that generating a new token from an existing generator cannot
fail due to a full file descriptor table. Creating a new Session::Token object for every
token can fail because the constructor opens "/dev/urandom" which will not succeed if all
allotted descriptors are in use. Programs that re-use the generator are also more
efficient and are less likely to cause problems in "chroot"ed environments where
"/dev/urandom" can no longer be opened.

However, re-using a generator may be undesirable because servers are typically started
immediately after a system reboot and the kernel's randomness pool might be poorly seeded
at that point meaning that all subsequently generated tokens may be based on a
weak/predictable seed. For this reason, you might choose to defer creating the generator
until the first request actually comes in and/or periodically re-create the generator
object.

Aside: Some crappy (usually C) programs that assume opening "/dev/urandom" will always
succeed can return session tokens based only on the contents of nulled or uninitialised
memory (unix really ought to provide a system call for random data). The Session::Token
constructor throws an exception if it can't seed itself.

CUSTOM ALPHABETS

       Being able to choose exactly which characters appear in your token is sometimes useful.
       This set of characters is called the alphabet. The default alphabet size is 62 characters:
       uppercase letters, lowercase letters, and digits ("a-zA-Z0-9").

       For some purposes, base-62 is a sweet spot. It is more compact than hexadecimal encoding
       which helps with efficiency because session tokens are usually transferred over the
       network many times during a session (often uncompressed in HTTP headers).

       Also, base-62 tokens don't use "wacky" characters like base-64 encodings do. These
       characters sometimes cause encoding/escaping problems (ie when embedded in URLs) and are
       annoying because often you can't select tokens by double-clicking on them.

       Although the default is base-62, there are all kinds of reasons for using another
       alphabet. One example is if your users are reading tokens from a print-out or SMS or
       whatever, you may choose to omit characters like "o", "O", and 0 that can easily be
       confused.

       To set a custom alphabet, just pass in either a string or an array of characters to the
       "alphabet" parameter of the constructor:

           Session::Token->new(alphabet => '01')->get;
           Session::Token->new(alphabet => ['0', '1'])->get; # same thing
           Session::Token->new(alphabet => ['a'..'z'])->get; # character range

       Constructor args can be a hash-ref too:

           Session::Token->new({ alphabet => ['a'..'z'] })->get;

ENTROPY

       There are two ways to specify the length of tokens. The most primitive is in terms of
       characters:

           print Session::Token->new(length => 5)->get;
           ## -> wpLH4

       But the primary way is to specify their minimum entropy in terms of bits:

           print Session::Token->new(entropy => 24)->get;
           ## -> Fo5SX

       In the above example, the resulting token contains at least 24 bits of entropy. Given the
       default base-62 alphabet, we can compute the exact entropy of a 5 character token as
       follows:

           $ perl -E 'say 5 * log(62)/log(2)'
           29.7709815519344

       So these tokens have about 29.8 bits of entropy. Note that if we removed one character
       from this token, it would bring it below our desired 24 bits of entropy:

           $ perl -E 'say 4 * log(62)/log(2)'
           23.8167852415475

       The default minimum entropy is 128 bits. Default tokens are 22 characters long and
       therefore have about 131 bits of entropy:

           $ perl -E 'say 22 * log(62)/log(2)'
           130.992318828511

       An interesting observation is that in base-64 representation, 128-bit minimum tokens also
       require 22 characters and that these tokens contain only 1 more bit of entropy.

       Another Session::Token design criterion is that all tokens should be the same length. The
       default token length is 22 characters and the tokens are always exactly 22 characters (no
       more, no less). Instead of tokens that are exactly "N" characters, some libraries that use
       arbitrary precision arithmetic end up creating tokens of at most "N" characters.

       A fixed token length is nice because it makes writing matching regular expressions easier,
       simplifies storage (you never have to store length), causes various log files and things
       to line up neatly on your screen, and ensures that encrypted tokens won't leak token
       entropy due to length (see "VARIABLE LENGTH TOKENS").

       In summary, the default token length of exactly 22 characters is a consequence of these
       decisions: base-62 representation, 128 bit minimum token entropy, and fixed token length.

MOD BIAS

       Some token generation libraries that implement custom alphabets will generate a random
       value, compute its modulus over the size of an alphabet, and then use this modulus to
       index into the alphabet to determine an output character.

       Assume we have a uniform random number source that generates values in the set "[0,1,2,3]"
       (most PRNGs provide sequences of bits, in other words power-of-2 size sets) and wish to
       use the alphabet "abc".

       If we use the naïve modulus algorithm described above then 0 maps to "a", 1 maps to "b", 2
       maps to "c", and 3 also maps to "a". This results in the following biased distribution for
       each character in the token:

           P(a) = 2/4 = 1/2
           P(b) = 1/4
           P(c) = 1/4

       Of course in an unbiased distribution, each character would have the same chance:

           P(a) = 1/3
           P(b) = 1/3
           P(c) = 1/3

       Bias is undesirable because certain tokens are obvious starting points when token guessing
       and certain other tokens are very unlikely. Tokens that are unbiased are equally likely
       and therefore there is no obvious starting point with them.

       Session::Token provides unbiased tokens regardless of the size of your alphabet (though
       see the "INTRODUCING BIAS" section for a mis-use warning). It does this in the same way
       that you might simulate producing unbiased random numbers from 1 to 5 given an unbiased
       6-sided die: Re-roll every time a 6 comes up.

       In the above example, Session::Token eliminates bias by only using values of 0, 1, and 2
       (the "t/no-mod-bias.t" test contains some more notes on this topic).

       Note that mod bias can be made arbitrarily small by increasing the amount of data consumed
       from a random number generator (provided that arbitrary precision modulus is available).
       Because this module fundamentally avoids mod bias, it can use each of the 4 bytes from an
       ISAAC-32 word for a separate character (excepting "re-rolls").

EFFICIENCY OF RE-ROLLING

Throwing away a portion of random data in order to avoid mod bias is slightly inefficient.
How many bytes from ISAAC do we expect to consume for every character in the token? It
depends on the size of the alphabet.

Session::Token masks off each byte using the smallest power of two greater than or equal
to the alphabet size minus one so the probability that any particular byte can be used is:

P = alphabet_size / next_power_of_two(alphabet_size)

For example, with the default base-62 alphabet "P" is "62/64".

In order to find the average number of bytes consumed for each character, calculate the
expected value "E". There is a probability "P" that the first byte will be used and
therefore only one byte will be consumed, and a probability "1 - P" that "1 + E" bytes
will be consumed:

E = P*1 + (1 - P)*(1 + E)

E = P + 1 + E - P - P*E

0 = 1 - P*E

P*E = 1

E = 1/P

So for the default base-62 alphabet, the average number of bytes consumed for each
character in a token is:

E = 1/(62/64) = 64/62 ≅ 1.0323

Because of the next power of two masking optimisation described above, "E" will always be
less than 2. In the worst case scenario of an alphabet with 129 characters, "E" is roughly
1.9845.

This minor inefficiency isn't an issue because the ISAAC implementation used is quite fast
and this module is very thrifty in how it uses ISAAC's output.

INTRODUCING BIAS

       If your alphabet contains the same character two or more times, this character will be
       more biased than a character that only occurs once. You should be careful that your
       alphabets don't repeat in this way if you are trying to create random session tokens.

       However, if you wish to introduce bias this library doesn't try to stop you. (Maybe it
       should print a warning?)

           Session::Token->new(alphabet => '0000001', length => 5000)->get; # don't do this
           ## -> 0000000000010000000110000000000000000000000100...

       Due to a limitation discussed below, alphabets larger than 256 aren't currently supported
       so your bias can't get very granular.

       Aside: If you have a constant-biased output stream like the above example produces then
       you can re-construct an un-biased bit sequence with the von neumann algorithm. This works
       by comparing pairs of bits. If the pair consists of identical bits, it is discarded.
       Otherwise the order of the different bits is used to determine an output bit, ie 00 and 11
       are discarded but 01 and 10 are mapped to output bits of 0 and 1 respectively. This only
       works if the bias in each bit is constant (like all characters in a Session::Token are).

ALPHABET SIZE LIMITATION

       Due to a limitation in this module's code, alphabets can't be larger than 256 characters.
       Everywhere the above manual says "characters" it actually means bytes. This isn't a
       Unicode limitation per se, just the maximum size of the alphabet. If you like, you can map
       tokens onto new alphabets as long as they aren't more than 256 characters long. Here is
       how to generate a 128-bit minimum entropy token using the lowercase greek alphabet (note
       that both forms of lowercase sigma are included which may not be desirable):

           use utf8;
           my $token = Session::Token->new(alphabet => [map {chr} 0..25])->get;
           $token = join '', map {chr} map {ord($_) + ord('α')} split //, $token;
           # ρφνδαπξδββφδοςλχτμγσψδψζειετ

       Here's an interesting way to generate a uniform random integer between 0 to 999 inclusive:

           0 + Session::Token->new(alphabet => ['0'..'9'], length => 3)->get

       If you wanted to natively support high code points, there is no point in hard-coding a
       limitation on the size of Unicode or even the (higher) limitation of perl characters.
       Instead, arbitrary precision "characters" should be supported with bigint. Here's an
       example of something similar in lisp: isaac.lisp <http://hcsw.org/downloads/isaac.lisp>.

       This module is not however designed to be the ultimate random number generator and at this
       time I think changing the design as described above would interfere with its goal of being
       secure, efficient, and simple.

TOKEN TEMPLATES

       String::Random has a method called "randpattern" where you provide a pattern that serves
       as a template when creating the token. You define the meaning of 1 or more template
       characters and each one that occurs in the pattern is replaced by a random character from
       a corresponding alphabet.

       Andrew Beverley requested this feature for Session::Token and I suggested approximately
       the following:

           use Session::Token;

           sub token_template {
             my (%m) = @_;

             %m = map { $_ => Session::Token->new(alphabet => $m{$_}, length => 1) } keys %m;

             return sub {
               my $v = shift;
               $v =~ s/(.)/exists $m{$1} ? $m{$1}->get : $1/eg;
               return $v;
             };
           }

       In order to use "token_template" you should pass it key-vaue pairs of the different token
       characters and the alphabets they represent. It will return a sub that should be passed
       the template pattern and it will return the resulting random tokens.

       For example, here is how to create UUID version 4
       <https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_.28random.29>
       tokens:

           sub uuid_v4_generator {
             my $t = token_template(
                   x => [ 0..9, 'a'..'f' ],
                   y => [ 8, 9, 'a', 'b' ],
                 );

             return sub {
               return $t->('xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx');
             }
           }

       "uuid_v4_generator" returns a generator function that will return tokens of the following
       form:

           1b782499-9913-4726-a80a-25e7b2221a7c
           90f85a64-d826-43bf-98e7-94ba87406bfb
           b8b73175-3cce-4861-b43b-3dec5ed5d641
           3afb64ab-6de3-4647-bbff-eb94dfa7d4b0
           447d2001-2aec-4d32-9910-8c289ae34c48

       Note that characters in the pattern which don't have template characters defined ("-" and
       4 in the above example) are passed through to the output token.

SEEDING

       This module is designed to always seed itself from your kernel's secure random number
       source. You should never need to seed it yourself.

       However if you know what you're doing you can pass in a custom seed as a 1024 byte long
       string. For example, here is how to create a "null seeded" generator:

           my $gen = Session::Token(seed => "\x00" x 1024);

       This is done in the test-suite to compare against Jenkins' reference ISAAC output, but
       obviously don't do this in regular applications because the generated tokens will be the
       same every time your program is run.

       One valid reason for manually seeding is if you have some reason to believe that there
       isn't enough entropy in your kernel's randomness pool and therefore you don't trust
       "/dev/urandom". In this case you should acquire your own seed data from somewhere
       trustworthy (maybe "/dev/random" or a previously stored trusted seed).

VARIABLE LENGTH TOKENS

As mentioned above, all tokens produced by a Session::Token generator are the same length.
If you prefer tokens of variable length, it is possible to post-process the tokens in
order to achieve this so long as you keep some things in mind.

If you randomly truncate tokens created by Session::Token, be careful not to introduce
bias. For example, if you choose the length of the token as a uniformly distributed random
length between 8 and 10, then the output will be biased towards shorter token sizes.
Length 8 tokens should appear less frequently than length 9 or 10 tokens because there are
fewer of them.

Another approach is to eliminate leading characters of a given value in the same way as
leading 0s are commonly eliminated from numeric representations. Although this approach
doesn't introduce bias, the tokens 1 and 01 are not distinct so it does not increase token
entropy given a fixed maximum token length which is the main reason for preferring
variable length tokens. The ideal variable length algorithm would generate both 1 and 01
tokens (with identical frequency of course).

Implementing unbiased, variable-length tokens would complicate the Session::Token
implementation especially since you should still be able to specify minimum entropy
variable-length tokens. Minimum entropy is the primary input to Session::Token, not token
length. This is the reason that the default token length of 22 isn't hard-coded anywhere
in the Session::Token source code (but 128 is).

The final reason that Session::Token discourages variable length tokens is that they can
leak token information through a side-channel. This could occur when a message is
encrypted but the length of the original message can be inferred from the encrypted
ciphertext.

BUGS

       Should check for biased alphabets and print warnings.

       Would be cool if it could detect forks and warn or re-seed in the child process (without
       incurring "getpid" overhead).

       There is currently no way to extract the seed from a Session::Token object. Note when
       implementing this: The saved seed must either store the current state of the ISAAC round
       as well as the 1024 byte "randsl" array or else do some kind of minimum fast forwarding in
       order to protect against a partially duplicated output-stream bug.

       Doesn't work on perl 5.6 and below due to the use of ":raw" (thanks CPAN testers). It
       could probably use "binmode" instead, but meh.

       On windows we use Crypt::Random::Source::Strong::Win32 which has a big dependency tree. We
       should instead use a slimmer module like Crypt::Random::Seed.

COMMAND-LINE APP

       There is a command-line application called App::Session::Token which is a convenience
       wrapper around Session::Token. You can generate session tokens by running the
       "session-token" binary:

           $ echo "Your password is `session-token`"
           Your password is 8Yom6z4AeB1RXxCGzklJFt

       It supports all the options of this module via command line parameters, and multiple
       session tokens can be generated with the "--num" (aka "-n") switch. For example:

           $ session-token --alphabet ABC --entropy 32 --num 5
           BACAACABCCCCAACBBBCAB
           BCBACACBBCACCBABABCBA
           ABBBCBABBACBBBCBBBCCA
           AACCBBBCCAAACBABACABC
           CCABCABBCCCAACAAACCAA

AUTHOR

       Doug Hoyte, "<doug@hcsw.org>"

COPYRIGHT & LICENSE

       Copyright 2012-2014 Doug Hoyte.

       This module is licensed under the same terms as perl itself.

       ISAAC code:

           By Bob Jenkins.  My random number generator, ISAAC.  Public Domain