Ubuntu Manpage: dieharder - A testing and benchmarking tool for random number generators.

NAME

       dieharder - A testing and benchmarking tool for random number generators.

SYNOPSIS

       dieharder [-a] [-d dieharder test number] [-f filename] [-B]
                 [-D output flag [-D output flag] ... ] [-F] [-c separator]
                 [-g generator number or -1] [-h] [-k ks_flag] [-l]
                 [-L overlap] [-m multiply_p] [-n ntuple]
                 [-p number of p samples] [-P Xoff]
                 [-o filename] [-s seed strategy] [-S random number seed]
                 [-n ntuple] [-p number of p samples] [-o filename]
                 [-s seed strategy] [-S random number seed]
                 [-t number of test samples] [-v verbose flag]
                 [-W weak] [-X fail] [-Y Xtrategy]
                 [-x xvalue] [-y yvalue] [-z zvalue]

dieharder OPTIONS

-a runs all the tests with standard/default options to create a
user-controllable report. To control the formatting of the report, see -D below.
To control the power of the test (which uses default values for tsamples that
cannot generally be varied and psamples which generally can) see -m below as a
"multiplier" of the default number of psamples (used only in a -a run).

-d test number - selects specific diehard test.

-f filename - generators 201 or 202 permit either raw binary or
formatted ASCII numbers to be read in from a file for testing. generator 200 reads
in raw binary numbers from stdin. Note well: many tests with default parameters
require a lot of rands! To see a sample of the (required) header for ASCII
formatted input, run

dieharder -o -f example.input -t 10

and then examine the contents of example.input. Raw binary input reads 32 bit
increments of the specified data stream. stdin_input_raw accepts a pipe from a raw
binary stream.

-B binary mode (used with -o below) causes output rands to be written in raw binary, not
formatted ascii.

-D output flag - permits fields to be selected for inclusion in
dieharder output. Each flag can be entered as a binary number that turns on a
specific output field or header or by flag name; flags are aggregated. To see all
currently known flags use the -F command.

-F - lists all known flags by name and number.

-c table separator - where separator is e.g. ',' (CSV) or ' ' (whitespace).

-g generator number - selects a specific generator for testing. Using
-g -1 causes all known generators to be printed out to the display.

-h prints context-sensitive help -- usually Usage (this message) or a
test synopsis if entered as e.g. dieharder -d 3 -h.

-k ks_flag - ks_flag

0 is fast but slightly sloppy for psamples > 4999 (default).

1 is MUCH slower but more accurate for larger numbers of psamples.

2 is slower still, but (we hope) accurate to machine precision for any number of
psamples up to some as yet unknown numerical upper limit (it has been tested out to
at least hundreds of thousands).

3 is kuiper ks, fast, quite inaccurate for small samples, deprecated.

-l list all known tests.

-L overlap

1 (use overlap, default)

0 (don't use overlap)

in operm5 or other tests that support overlapping and non-overlapping sample modes.

-m multiply_p - multiply default # of psamples in -a(ll) runs to crank
up the resolution of failure. -n ntuple - set ntuple length for tests on short bit
strings that permit the length to be varied (e.g. rgb bitdist).

-o filename - output -t count random numbers from current generator to file.

-p count - sets the number of p-value samples per test (default 100).

-P Xoff - sets the number of psamples that will cumulate before deciding
that a generator is "good" and really, truly passes even a -Y 2 T2D run. Currently
the default is 100000; eventually it will be set from AES-derived T2D test failure
thresholds for fully automated reliable operation, but for now it is more a
"boredom" threshold set by how long one might reasonably want to wait on any given
test run.

-S seed - where seed is a uint. Overrides the default random seed
selection. Ignored for file or stdin input.

-s strategy - if strategy is the (default) 0, dieharder reseeds (or
rewinds) once at the beginning when the random number generator is selected and
then never again. If strategy is nonzero, the generator is reseeded or rewound at
the beginning of EACH TEST. If -S seed was specified, or a file is used, this
means every test is applied to the same sequence (which is useful for validation
and testing of dieharder, but not a good way to test rngs). Otherwise a new random
seed is selected for each test.

-t count - sets the number of random entities used in each test, where
possible. Be warned -- some tests have fixed sample sizes; others are variable but
have practical minimum sizes. It is suggested you begin with the values used in -a
and experiment carefully on a test by test basis.

-W weak - sets the "weak" threshold to make the test(s) more or less
forgiving during e.g. a test-to-destruction run. Default is currently 0.005.

-X fail - sets the "fail" threshold to make the test(s) more or less
forgiving during e.g. a test-to-destruction run. Default is currently 0.000001,
which is basically "certain failure of the null hypothesis", the desired mode of
reproducible generator failure.

-Y Xtrategy - the Xtrategy flag controls the new "test to failure" (T2F)
modes. These flags and their modes act as follows:

0 - just run dieharder with the specified number of tsamples and psamples, do not
dynamically modify a run based on results. This is the way it has always run, and
is the default.

1 - "resolve ambiguity" (RA) mode. If a test returns "weak", this is an
undesired result. What does that mean, after all? If you run a long test series,
you will see occasional weak returns for a perfect generators because p is
uniformly distributed and will appear in any finite interval from time to time.
Even if a test run returns more than one weak result, you cannot be certain that
the generator is failing. RA mode adds psamples (usually in blocks of 100) until
the test result ends up solidly not weak or proceeds to unambiguous failure. This
is morally equivalent to running the test several times to see if a weak result is
reproducible, but eliminates the bias of personal judgement in the process since
the default failure threshold is very small and very unlikely to be reached by
random chance even in many runs.

This option should only be used with -k 2.

2 - "test to destruction" mode. Sometimes you just want to know where or if a
generator will .I ever fail a test (or test series). -Y 2 causes psamples to be
added 100 at a time until a test returns an overall pvalue lower than the failure
threshold or a specified maximum number of psamples (see -P) is reached.

Note well! In this mode one may well fail due to the alternate null hypothesis --
the test itself is a bad test and fails! Many dieharder tests, despite our best
efforts, are numerically unstable or have only approximately known target
statistics or are straight up asymptotic results, and will eventually return a
failing result even for a gold-standard generator (such as AES), or for the
hypercautious the XOR generator with AES, threefish, kiss, all loaded at once and
xor'd together. It is therefore safest to use this mode .I comparatively,
executing a T2D run on AES to get an idea of the test failure threshold(s)
(something I will eventually do and publish on the web so everybody doesn't have to
do it independently) and then running it on your target generator. Failure with
numbers of psamples within an order of magnitude of the AES thresholds should
probably be considered possible test failures, not generator failures. Failures at
levels significantly less than the known gold standard generator failure thresholds
are, of course, probably failures of the generator.

This option should only be used with -k 2.

-v verbose flag -- controls the verbosity of the output for debugging
only. Probably of little use to non-developers, and developers can read the
enum(s) in dieharder.h and the test sources to see which flag values turn on output
on which routines. 1 is result in a highly detailed trace of program activity.

-x,-y,-z number - Some tests have parameters that can safely be varied
from their default value. For example, in the diehard birthdays test, one can vary
the number of length, which can also be varied. -x 2048 -y 30 alters these two
values but should still run fine. These parameters should be documented internally
(where they exist) in the e.g. -d 0 -h visible notes.

NOTE WELL: The assessment(s) for the rngs may, in fact, be completely incorrect or
misleading. There are still "bad tests" in dieharder, although we are working to
fix and improve them (and try to document them in the test descriptions visible
with -g testnumber -h). In particular, 'Weak' pvalues should occur one test in two
hundred, and 'Failed' pvalues should occur one test in a million with the default
thresholds - that's what p MEANS. Use them at your Own Risk! Be Warned!

Or better yet, use the new -Y 1 and -Y 2 resolve ambiguity or test to destruction
modes above, comparing to similar runs on one of the as-good-as-it-gets
cryptographic generators, AES or threefish.

DESCRIPTION

       dieharder

       Welcome to the current snapshot of the dieharder random number  tester.   It  encapsulates
       all  of  the  Gnu  Scientific  Library  (GSL) random number generators (rngs) as well as a
       number  of  generators  from  the  R  statistical  library,  hardware  sources   such   as
       /dev/*random,  "gold  standard"  cryptographic  quality  generators  (useful  for  testing
       dieharder and for purposes  of  comparison  to  new  generators)  as  well  as  generators
       contributed  by  users or found in the literature into a single harness that can time them
       and subject them to various tests for randomness.  These tests are  variously  drawn  from
       George  Marsaglia's  "Diehard  battery  of random number tests", the NIST Statistical Test
       Suite, and again from other sources such as personal invention, user  contribution,  other
       (open source) test suites, or the literature.

       The  primary  point of dieharder is to make it easy to time and test (pseudo)random number
       generators, including both software and hardware rngs, with a fully open source tool.   In
       addition  to  providing  "instant" access to testing of all built-in generators, users can
       choose one of three ways to test their own random number generators or  sources:   a  unix
       pipe  of  a  raw binary (presumed random) bitstream; a file containing a (presumed random)
       raw binary bitstream or formatted ascii uints or floats; and embedding your  generator  in
       dieharder's  GSL-compatible  rng harness and adding it to the list of built-in generators.
       The stdin and file input methods are described below in their own section, as is suggested
       "best practice" for newbies to random number generator testing.

       An  important  motivation  for  using dieharder is that the entire test suite is fully Gnu
       Public License (GPL) open source code and hence rather than being prohibited from "looking
       underneath  the  hood" all users are openly encouraged to critically examine the dieharder
       code for errors, add new tests or generators or user interfaces, or use it freely as is to
       test  their  own favorite candidate rngs subject only to the constraints of the GPL.  As a
       result of its openness, literally  hundreds  of  improvements  and  bug  fixes  have  been
       contributed  by  users  to  date, resulting in a far stronger and more reliable test suite
       than would have been possible with closed and locked down sources  or  even  open  sources
       (such  as  STS)  that  lack  the dynamical feedback mechanism permitting corrections to be
       shared.

       Even small errors in test  statistics  permit  the  alternative  (usually  unstated)  null
       hypothesis  to become an important factor in rng testing -- the unwelcome possibility that
       your generator is just fine but it is the test that  is  failing.   One  extremely  useful
       feature  of  dieharder is that it is at least moderately self validating.  Using the "gold
       standard" aes and threefish cryptographic generators, you can observe how these generators
       perform  on  dieharder runs to the same general degree of accuracy that you wish to use on
       the generators you are testing.  In general, dieharder tests that consistently fail at any
       given  level  of precision (selected with e.g. -a -m 10) on both of the gold standard rngs
       (and/or the better GSL generators, mt19937, gfsr4, taus) are probably unreliable  at  that
       precision and it would hardly be surprising if they failed your generator as well.

       Experts  in  statistics  are  encouraged to give the suite a try, perhaps using any of the
       example calls below at first and then using it freely on their  own  generators  or  as  a
       harness  for  adding  their  own  tests.   Novices  (to either statistics or random number
       generator testing) are strongly encouraged to read the next section on  p-values  and  the
       null  hypothesis  and running the test suite a few times with a more verbose output report
       to learn how the whole thing works.

QUICK START EXAMPLES

       Examples for how to set up pipe or file input are given below.  However, it is recommended
       that  a  user play with some of the built in generators to gain familiarity with dieharder
       reports and tests before tackling their own favorite generator or file  full  of  possibly
       random numbers.

       To see dieharder's default standard test report for its default generator (mt19937) simply
       run:

          dieharder -a

       To increase the resolution of possible failures of the standard -a(ll) test,  use  the  -m
       "multiplier"  for  the  test default numbers of pvalues (which are selected more to make a
       full test run take an hour or so instead of days than because it is  truly  an  exhaustive
       test sequence) run:

          dieharder -a -m 10

       To test a different generator (say the gold standard AES_OFB) simply specify the generator
       on the command line with a flag:

          dieharder -g 205 -a -m 10

       Arguments can be in any order.  The generator can also be selected by name:

          dieharder -g AES_OFB -a

       To apply only the diehard opso test to the AES_OFB generator, specify the test by name  or
       number:

          dieharder -g 205 -d 5

       or

          dieharder -g 205 -d diehard_opso

       Nearly  every  aspect  or  field in dieharder's output report format is user-selectable by
       means of display option flags.  In addition, the field separator character can be selected
       by the user to make the output particularly easy for them to parse (-c ' ') or import into
       a spreadsheet (-c ',').  Try:

          dieharder -g 205 -d diehard_opso -c ',' -D test_name -D pvalues

       to see an extremely terse, easy to import report or

          dieharder -g 205 -d diehard_opso -c ' ' -D default -D histogram -D description

       to see a verbose report good for a "beginner" that includes a  full  description  of  each
       test itself.

       Finally,  the  dieharder  binary is remarkably autodocumenting even if the man page is not
       available. All users should try the following commands to see what they do:

          dieharder -h

       (prints the command synopsis like the one above).

          dieharder -a -h
          dieharder -d 6 -h

       (prints the test descriptions only for -a(ll) tests or for the specific test indicated).

          dieharder -l

       (lists all known tests, including how reliable rgb thinks that they are as things stand).

          dieharder -g -1

       (lists all known rngs).

          dieharder -F

       (lists all the currently known display/output control flags used with -D).

       Both beginners and experts should be aware that the assessment provided  by  dieharder  in
       its  standard report should be regarded with great suspicion.  It is entirely possible for
       a generator to "pass" all tests as far as their individual p-values are concerned and  yet
       to  fail utterly when considering them all together.  Similarly, it is probable that a rng
       will at the very least show up as "weak" on 0, 1 or 2 tests in a typical -a(ll)  run,  and
       may  even  "fail"  1  test  one such run in 10 or so.  To understand why this is so, it is
       necessary to understand something of rng testing, p-values, and the null hypothesis!

P-VALUES AND THE NULL HYPOTHESIS

       dieharder returns "p-values".  To understand what a p-value is and how to use  it,  it  is
       essential to understand the null hypothesis, H0.

       The  null  hypothesis  for random number generator testing is "This generator is a perfect
       random number generator, and for any choice of seed produces  a  infinitely  long,  unique
       sequence  of  numbers that have all the expected statistical properties of random numbers,
       to all orders".  Note well that we know that this hypothesis is technically false for  all
       software  generators  as they are periodic and do not have the correct entropy content for
       this statement to ever be true.  However, many hardware generators fail a priori as  well,
       as  they  contain  subtle  bias  or  correlations  due  to  the deterministic physics that
       underlies them.  Nature is often unpredictable but it is rarely random and the  two  words
       don't (quite) mean the same thing!

       The  null  hypothesis  can  be  practically  true,  however.   Both  software and hardware
       generators can be "random" enough that their sequences cannot be distinguished from random
       ones,  at  least  not  easily or with the available tools (including dieharder!) Hence the
       null hypothesis is a practical, not a theoretically pure, statement.

       To test H0 , one uses the rng in question to generate  a  sequence  of  presumably  random
       numbers.   Using these numbers one can generate any one of a wide range of test statistics
       -- empirically computed numbers that are considered random samples that may or may not  be
       covariant  subject to H0, depending on whether overlapping sequences of random numbers are
       used to generate successive samples while generating the statistic(s), drawn from a  known
       distribution.   From  a  knowledge  of the target distribution of the statistic(s) and the
       associated cumulative distribution function (CDF) and the empirical value of the  randomly
       generated statistic(s), one can read off the probability of obtaining the empirical result
       if the sequence was truly random, that  is,  if  the  null  hypothesis  is  true  and  the
       generator  in  question  is a "good" random number generator!  This probability is the "p-
       value" for the particular test run.

       For example, to test a coin (or a sequence of bits) we might simply count  the  number  of
       heads  and tails in a very long string of flips.  If we assume that the coin is a "perfect
       coin", we expect the number of heads and tails to be binomially distributed and can easily
       compute  the  probability  of  getting  any  particular  number of heads and tails.  If we
       compare our recorded number of heads and tails from the test series to  this  distribution
       and  find that the probability of getting the count we obtained is very low with, say, way
       more heads than tails we'd suspect the coin wasn't a perfect coin.  dieharder applies this
       very  test  (made  mathematically  precise)  and  many  others  that  operate on this same
       principle to the string of random bits produced by the  rng  being  tested  to  provide  a
       picture of how "random" the rng is.

       Note that the usual dogma is that if the p-value is low -- typically less than 0.05 -- one
       "rejects" the null hypothesis.  In a word, it is improbable that one would get the  result
       obtained  if the generator is a good one.  If it is any other value, one does not "accept"
       the generator as good, one "fails to reject" the generator  as  bad  for  this  particular
       test.   A  "good  random  number generator" is hence one that we haven't been able to make
       fail yet!

       This criterion is, of course, naive in the extreme and cannot be used with dieharder!   It
       makes just as much sense to reject a generator that has p-values of 0.95 or more!  Both of
       these p-value ranges are equally unlikely on any given test run, and  should  be  returned
       for  (on  average)  5% of all test runs by a perfect random number generator.  A generator
       that fails to produce p-values less than 0.05 5% of the time it is tested  with  different
       seeds  is  a  bad random number generator, one that fails the test of the null hypothesis.
       Since dieharder returns over 100 pvalues  by  default  per  test,  one  would  expect  any
       perfectly  good  rng  to "fail" such a naive test around five times by this criterion in a
       single dieharder run!

       The p-values themselves, as it turns out, are test statistics!  By their nature,  p-values
       should  be  uniformly  distributed  on  the range 0-1.  In 100+ test runs with independent
       seeds, one should not be surprised to obtain 0, 1, 2, or even  (rarely)  3  p-values  less
       than  0.01.  On the other hand obtaining 7 p-values in the range 0.24-0.25, or seeing that
       70 of the p-values are greater than 0.5 should make the generator highly suspect!  How can
       a  user determine when a test is producing "too many" of any particular value range for p?
       Or too few?

       Dieharder does it for you, automatically.  One can in fact convert a set of p-values  into
       a  p-value by comparing their distribution to the expected one, using a Kolmogorov-Smirnov
       test against the expected uniform distribution of p.

       These p-values obtained from looking at the distribution of p-values  should  in  turn  be
       uniformly  distributed  and  could  in  principle  be  subjected to still more KS tests in
       aggregate.  The distribution of p-values for a good generator should be  idempotent,  even
       across different test statistics and multiple runs.

       A failure of the distribution of p-values at any level of aggregation signals trouble.  In
       fact, if the p-values of any given test are subjected to a KS test, and those p-values are
       then  subjected  to  a  KS  test,  as  we add more p-values to either level we will either
       observe idempotence of the resulting distribution of p to uniformity, or we  will  observe
       idempotence to a single p-value of zero!  That is, a good generator will produce a roughly
       uniform distribution of  p-values,  in  the  specific  sense  that  the  p-values  of  the
       distributions  of  p-values are themselves roughly uniform and so on ad infinitum, while a
       bad generator will produce a non-uniform distribution of p-values, and  as  more  p-values
       drawn  from  the  non-uniform  distribution  are  added  to its KS test, at some point the
       failure will be absolutely unmistakeable as the resulting  p-value  approaches  0  in  the
       limit.  Trouble indeed!

       The  question  is,  trouble  with  what?   Random  number  tests  are  themselves  complex
       computational objects, and there is a probability that their code is incorrectly framed or
       that  roundoff  or  other  numerical  --  not  methodical  -- errors are contributing to a
       distortion of the distribution of some of the p-values obtained.   This  is  not  an  idle
       observation;  when  one  works on writing random number generator testing programs, one is
       always testing the tests themselves with "good" (we hope) random number generators so that
       egregious  failures  of the null hypothesis signal not a bad generator but an error in the
       test code.  The null hypothesis above is correctly framed  from  a  theoretical  point  of
       view,  but  from  a  real and practical point of view it should read: "This generator is a
       perfect random number generator, and for any choice of seed produces  a  infinitely  long,
       unique  sequence  of  numbers  that have all the expected statistical properties of random
       numbers, to all orders and this test is a perfect test and returns  precisely  correct  p-
       values  from  the test computation."  Observed "failure" of this joint null hypothesis H0'
       can come from failure of either or both of these disjoint components, and comes  from  the
       second  as  often  or more often than the first during the test development process.  When
       one cranks up the "resolution" of the test (discussed next) to where a generator starts to
       fail  some  test one realizes, or should realize, that development never ends and that new
       test regimes will always reveal new failures not only of the generators but of the code.

       With that said, one of dieharder's most significant advantages  is  the  control  that  it
       gives  you  over  a  critical  test parameter.  From the remarks above, we can see that we
       should feel very uncomfortable about "failing" any given random number  generator  on  the
       basis  of  a  5%,  or  even  a  1%,  criterion, especially when we apply a test suite like
       dieharder that returns over 100 (and climbing) distinct  test  p-values  as  of  the  last
       snapshot.  We want failure to be unambiguous and reproducible!

       To  accomplish  this,  one  can  simply crank up its resolution.  If we ran any given test
       against a random number generator and it returned a p-value of  (say)  0.007328,  we'd  be
       perfectly  justified  in  wondering  if  it  is  really  a  good  generator.  However, the
       probability of getting this result isn't really all that small -- when one uses  dieharder
       for  hours  at  a  time numbers like this will definitely happen quite frequently and mean
       nothing.  If one runs the same test again (with a different seed or  part  of  the  random
       sequence)  and  gets  a  p-value  of 0.009122, and a third time and gets 0.002669 -- well,
       that's three 1% (or less) shots in a row and that should happen  only  one  in  a  million
       times.   One  way to clearly resolve failures, then, is to increase the number of p-values
       generated in a test run.  If the actual distribution of p being returned by  the  test  is
       not  uniform,  a  KS  test  will  eventually  return  a p-value that is not some ambiguous
       0.035517 but is instead 0.000000, with the latter produced time after time as we rerun.

       For this reason, dieharder is extremely conservative about announcing  rng  "weakness"  or
       "failure"  relative  to  any  given  test.   It's  internal criterion for these things are
       currently p < 0.5% or p > 99.5% weakness (at the 1% level total) and a  considerably  more
       stringent  criterion  for failure: p < 0.05% or p > 99.95%.  Note well that the ranges are
       symmetric -- too high a value of p is just as bad (and unlikely) as too  low,  and  it  is
       critical  to  flag  it, because it is quite possible for a rng to be too good, on average,
       and not to produce enough low p-values on the full spectrum of dieharder tests.   This  is
       where the final kstest is of paramount importance, and where the "histogram" option can be
       very useful to help you visualize the failure in the distribution of p -- run e.g.:

         dieharder [whatever] -D default -D histogram

       and you will see a crude ascii histogram of the pvalues that failed (or passed) any  given
       level of test.

       Scattered  reports  of  weakness  or  marginal  failure in a preliminary -a(ll) run should
       therefore not be immediate cause for alarm.  Rather, they are tests to  repeat,  to  watch
       out for, to push the rng harder on using the -m option to -a or simply increasing -p for a
       specific test.  Dieharder permits one to increase the number of p-values generated for any
       test, subject only to the availability of enough random numbers (for file based tests) and
       time, to make failures unambiguous.  A test that is truly  weak  at  -p  100  will  almost
       always  fail  egregiously  at  some  larger value of psamples, be it -p 1000 or -p 100000.
       However, because dieharder is a research tool  and  is  under  perpetual  development  and
       testing, it is strongly suggested that one always consider the alternative null hypothesis
       -- that the failure is a failure of the test code in dieharder itself  in  some  limit  of
       large  numbers  -- and take at least some steps (such as running the same test at the same
       resolution on a "gold standard" generator) to ensure that the failure is  indeed  probably
       in the rng and not the dieharder code.

       Lacking  a  source  of  perfect random numbers to use as a reference, validating the tests
       themselves is not easy and always leaves one with some ambiguity (even aes or  threefish).
       During development the best one can usually do is to rely heavily on these "presumed good"
       random number generators.  There are a number  of  generators  that  we  have  theoretical
       reasons  to  expect  to be extraordinarily good and to lack correlations out to some known
       underlying dimensionality, and that also test out extremely well quite  consistently.   By
       using  several  such  generators and not just one, one can hope that those generators have
       (at the very least) different correlations and should not all uniformly fail a test in the
       same  way and with the same number of p-values.  When all of these generators consistently
       fail a test at a given level, I tend to suspect that the problem is in the test code,  not
       the  generators,  although  it  is  very  difficult  to  be  certain,  and  many errors in
       dieharder's code have been discovered and ultimately fixed in just this way by  myself  or
       others.

       One  advantage of dieharder is that it has a number of these "good generators" immediately
       available  for  comparison  runs,  courtesy  of  the  Gnu  Scientific  Library  and   user
       contribution  (notably  David  Bauer,  who  kindly encapsulated aes and threefish).  I use
       AES_OFB, Threefish_OFB, mt19937_1999, gfsr4, ranldx2 and taus2 (as well as  "true  random"
       numbers  from random.org) for this purpose, and I try to ensure that dieharder will "pass"
       in particular the -g 205 -S 1 -s 1 generator at any reasonable p-value resolution  out  to
       -p 1000 or farther.

       Tests  (such  as  the  diehard  operm5 and sums test) that consistently fail at these high
       resolutions are flagged as being "suspect" -- possible failures of  the  alternative  null
       hypothesis  -- and they are strongly deprecated!  Their results should not be used to test
       random number generators pending agreement in the statistics and random  number  community
       that those tests are in fact valid and correct so that observed failures can indeed safely
       be attributed to a failure of the intended null hypothesis.

       As I keep emphasizing (for good reason!) dieharder is community  supported.   I  therefore
       openly  ask  that  the  users of dieharder who are expert in statistics to help me fix the
       code or algorithms being implemented.  I would like to see this test suite  ultimately  be
       validated  by  the  general statistics community in hard use in an open environment, where
       every possible failure of the testing mechanism itself is subject to scrutiny and eventual
       correction.  In this way we will eventually achieve a very powerful suite of tools indeed,
       ones that may well give us very specific information not just about  failure  but  of  the
       mode of failure as well, just how the sequence tested deviates from randomness.

       Thus  far,  dieharder  has  benefitted  tremendously from the community.  Individuals have
       openly contributed tests, new generators to be tested, and fixes for existing  tests  that
       were revealed by their own work with the testing instrument.  Efforts are underway to make
       dieharder more portable so that it will build on more platforms and faster  so  that  more
       thorough testing can be done.  Please feel free to participate.

FILE INPUT

       The  simplest  way  to  use  dieharder with an external generator that produces raw binary
       (presumed random) bits is to pipe the raw binary output from this generator  (presumed  to
       be a binary stream of 32 bit unsigned integers) directly into dieharder, e.g.:

         cat /dev/urandom | ./dieharder -a -g 200

       Go  ahead  and  try  this example.  It will run the entire dieharder suite of tests on the
       stream produced by the linux built-in generator /dev/urandom  (using  /dev/random  is  not
       recommended as it is too slow to test in a reasonable amount of time).

       Alternatively,  dieharder  can  be  used  to test files of numbers produced by a candidate
       random number generators:

         dieharder -a -g 201 -f random.org_bin

       for raw binary input or

         dieharder -a -g 202 -f random.org.txt

       for formatted ascii input.

       A formatted ascii input file can accept either uints (integers in the range 0  to  2^31-1,
       one  per  line) or decimal uniform deviates with at least ten significant digits (that can
       be multiplied by UINT_MAX = 2^32 to produce a uint without dropping precition),  also  one
       per  line.   Floats  with fewer digits will almost certainly fail bitlevel tests, although
       they may pass some of the tests that act on uniform deviates.

       Finally, one can fairly easily wrap any generator in the same (GSL) random number  harness
       used  internally by dieharder and simply test it the same way one would any other internal
       generator recognized by dieharder.  This is strongly recommended  where  it  is  possible,
       because  dieharder needs to use a lot of random numbers to thoroughly test a generator.  A
       built in generator can simply let dieharder determine how many it needs and generate  them
       on  demand, where a file that is too small will "rewind" and render the test results where
       a rewind occurs suspect.

       Note well that file input rands are delivered to the tests on  demand,  but  if  the  test
       needs  more than are available it simply rewinds the file and cycles through it again, and
       again, and again as needed.  Obviously this significantly reduces the sample space and can
       lead  to  completely  incorrect results for the p-value histograms unless there are enough
       rands to run EACH test without repetition (it  is  harmless  to  reuse  the  sequence  for
       different tests).  Let the user beware!

BEST PRACTICE

A frequently asked question from new users wishing to test a generator they are working on
for fun or profit (or both) is "How should I get its output into dieharder?" This is a
nontrivial question, as dieharder consumes enormous numbers of random numbers in a full
test cycle, and then there are features like -m 10 or -m 100 that let one effortlessly
demand 10 or 100 times as many to stress a new generator even more.

Even with large file support in dieharder, it is difficult to provide enough random
numbers in a file to really make dieharder happy. It is therefore strongly suggested that
you either:

a) Edit the output stage of your random number generator and get it to write its
production to stdout as a random bit stream -- basically create 32 bit unsigned random
integers and write them directly to stdout as e.g. char data or raw binary. Note that
this is not the same as writing raw floating point numbers (that will not be random at all
as a bitstream) and that "endianness" of the uints should not matter for the null
hypothesis of a "good" generator, as random bytes are random in any order. Crank the
generator and feed this stream to dieharder in a pipe as described above.

b) Use the samples of GSL-wrapped dieharder rngs to similarly wrap your generator (or
calls to your generator's hardware interface). Follow the examples in the ./dieharder
source directory to add it as a "user" generator in the command line interface, rebuild,
and invoke the generator as a "native" dieharder generator (it should appear in the list
produced by -g -1 when done correctly). The advantage of doing it this way is that you
can then (if your new generator is highly successful) contribute it back to the dieharder
project if you wish! Not to mention the fact that it makes testing it very easy.

Most users will probably go with option a) at least initially, but be aware that b) is
probably easier than you think. The dieharder maintainers may be able to give you a hand
with it if you get into trouble, but no promises.

WARNING!

A warning for those who are testing files of random numbers. dieharder is a tool that
tests random number generators, not files of random numbers! It is extremely
inappropriate to try to "certify" a file of random numbers as being random just because it
fails to "fail" any of the dieharder tests in e.g. a dieharder -a run. To put it bluntly,
if one rejects all such files that fail any test at the 0.05 level (or any other), the one
thing one can be certain of is that the files in question are not random, as a truly
random sequence would fail any given test at the 0.05 level 5% of the time!

To put it another way, any file of numbers produced by a generator that "fails to fail"
the dieharder suite should be considered "random", even if it contains sequences that
might well "fail" any given test at some specific cutoff. One has to presume that passing
the broader tests of the generator itself, it was determined that the p-values for the
test involved was globally correctly distributed, so that e.g. failure at the 0.01 level
occurs neither more nor less than 1% of the time, on average, over many many tests. If
one particular file generates a failure at this level, one can therefore safely presume
that it is a random file pulled from many thousands of similar files the generator might
create that have the correct distribution of p-values at all levels of testing and
aggregation.

To sum up, use dieharder to validate your generator (via input from files or an embedded
stream). Then by all means use your generator to produce files or streams of random
numbers. Do not use dieharder as an accept/reject tool to validate the files themselves!

EXAMPLES

       To demonstrate all tests, run on the default GSL rng, enter:

         dieharder -a

       To demonstrate a test of an external generator of a raw binary stream  of  bits,  use  the
       stdin (raw) interface:

         cat /dev/urandom | dieharder -g 200 -a

       To use it with an ascii formatted file:

         dieharder -g 202 -f testrands.txt -a

       (testrands.txt should consist of a header such as:

        #==================================================================
        # generator mt19937_1999  seed = 1274511046
        #==================================================================
        type: d
        count: 100000
        numbit: 32
        3129711816
          85411969
        2545911541

       etc.).

       To use it with a binary file

         dieharder -g 201 -f testrands.bin -a

       or

         cat testrands.bin | dieharder -g 200 -a

       An  example  that  demonstrates  the  use  of  "prefixes" on the output lines that make it
       relatively easy to filter off the different parts of the output report and  chop  them  up
       into numbers that can be used in other programs or in spreadsheets, try:

         dieharder -a -c ',' -D default -D prefix

DISPLAY OPTIONS

As of version 3.x.x, dieharder has a single output interface that produces tabular data
per test, with common information in headers. The display control options and flags can
be used to customize the output to your individual specific needs.

The options are controlled by binary flags. The flags, and their text versions, are
displayed if you enter:

dieharder -F

by itself on a line.

The flags can be entered all at once by adding up all the desired option flags. For
example, a very sparse output could be selected by adding the flags for the test_name (8)
and the associated pvalues (128) to get 136:

dieharder -a -D 136

Since the flags are cumulated from zero (unless no flag is entered and the default is
used) you could accomplish the same display via:

dieharder -a -D 8 -D pvalues

Note that you can enter flags by value or by name, in any combination. Because people use
dieharder to obtain values and then with to export them into spreadsheets (comma separated
values) or into filter scripts, you can chance the field separator character. For
example:

dieharder -a -c ',' -D default -D -1 -D -2

produces output that is ideal for importing into a spreadsheet (note that one can subtract
field values from the base set of fields provided by the default option as long as it is
given first).

An interesting option is the -D prefix flag, which turns on a field identifier prefix to
make it easy to filter out particular kinds of data. However, it is equally easy to turn
on any particular kind of output to the exclusion of others directly by means of the
flags.

Two other flags of interest to novices to random number generator testing are the -D
histogram (turns on a histogram of the underlying pvalues, per test) and -D description
(turns on a complete test description, per test). These flags turn the output table into
more of a series of "reports" of each test.

PUBLICATION RULES

       dieharder is entirely original code and can be modified and used  at  will  by  any  user,
       provided that:

         a)  The  original  copyright  notices  are maintained and that the source, including all
       modifications, is made publically available at the time of any derived publication.   This
       is  open  source  software according to the precepts and spirit of the Gnu Public License.
       See the accompanying file COPYING, which also must accompany any redistribution.

         b) The primary author of the code (Robert G. Brown) is  appropriately  acknowledged  and
       referenced in any derived publication.  It is strongly suggested that George Marsaglia and
       the Diehard suite and the various authors of  the  Statistical  Test  Suite  be  similarly
       acknowledged,  although  this  suite  shares  no actual code with these random number test
       suites.

         c) Full responsibility for the accuracy, suitability, and effectiveness of  the  program
       rests  with  the  users  and/or  modifiers.   As  is  clearly  stated  in the accompanying
       copyright.h:

       THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING  ALL
       IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS
       BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES  OR  ANY  DAMAGES  WHATSOEVER
       RESULTING  FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
       OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR  PERFORMANCE  OF
       THIS SOFTWARE.

ACKNOWLEDGEMENTS

       The  author  of  this  suite  gratefully  acknowledges George Marsaglia (the author of the
       diehard test suite) and the various authors of  NIST  Special  Publication  800-22  (which
       describes  the  Statistical  Test  Suite  for  testing  pseudorandom number generators for
       cryptographic applications), for excellent  descriptions  of  the  tests  therein.   These
       descriptions enabled this suite to be developed with a GPL.

       The  author  also  wishes  to  reiterate that the academic correctness and accuracy of the
       implementation of these tests is his sole responsibility and not that of  the  authors  of
       the  Diehard or STS suites.  This is especially true where he has seen fit to modify those
       tests from their strict original descriptions.

COPYRIGHT

       GPL 2b; see the file COPYING that accompanies the source of this  program.   This  is  the
       "standard  Gnu  General Public License version 2 or any later version", with the one minor
       (humorous) "Beverage" modification listed below.  Note that this modification is  probably
       not legally defensible and can be followed really pretty much according to the honor rule.

       As  to  my  personal  preferences in beverages, red wine is great, beer is delightful, and
       Coca Cola or coffee or tea or even milk acceptable to those who for religious or  personal
       reasons wish to avoid stressing my liver.

       The Beverage Modification to the GPL:

       Any  satisfied  user  of  this  software shall, upon meeting the primary author(s) of this
       software for the first time under the appropriate circumstances, offer to buy him  or  her
       or  them a beverage.  This beverage may or may not be alcoholic, depending on the personal
       ethical and moral views of the offerer.  The beverage cost need not exceed one U.S. dollar
       (although  it  certainly may at the whim of the offerer:-) and may be accepted or declined
       with no further obligation on the part of the offerer.  It is not necessary to repeat  the
       offer after the first meeting, but it can't hurt...