Ubuntu Manpage: dieharder - A testing and benchmarking tool for random number generators.

name
synopsis
dieharder options
description
quick start examples
p-values and the null hypothesis
file input
best practice
warning!
examples
display options
publication rules
acknowledgements
copyright

Provided by: dieharder_3.31.1.1-1build1_amd64

NAME

       dieharder - A testing and benchmarking tool for random number generators.

SYNOPSIS

       dieharder [-a] [-d dieharder test number] [-f filename] [-B]
                 [-D output flag [-D output flag] ... ] [-F] [-c separator]
                 [-g generator number or -1] [-h] [-k ks_flag] [-l]
                 [-L overlap] [-m multiply_p] [-n ntuple]
                 [-p number of p samples] [-P Xoff]
                 [-o filename] [-s seed strategy] [-S random number seed]
                 [-n ntuple] [-p number of p samples] [-o filename]
                 [-s seed strategy] [-S random number seed]
                 [-t number of test samples] [-v verbose flag]
                 [-W weak] [-X fail] [-Y Xtrategy]
                 [-x xvalue] [-y yvalue] [-z zvalue]

dieharder OPTIONS

-a runs all the tests with standard/default options to create a
user-controllable report. To control the formatting of the report, see -D below. To control the
power of the test (which uses default values for tsamples that cannot generally be varied and
psamples which generally can) see -m below as a "multiplier" of the default number of psamples
(used only in a -a run).

-d test number - selects specific diehard test.

-f filename - generators 201 or 202 permit either raw binary or
formatted ASCII numbers to be read in from a file for testing. generator 200 reads in raw binary
numbers from stdin. Note well: many tests with default parameters require a lot of rands! To see
a sample of the (required) header for ASCII formatted input, run

dieharder -o -f example.input -t 10

and then examine the contents of example.input. Raw binary input reads 32 bit increments of the
specified data stream. stdin_input_raw accepts a pipe from a raw binary stream.

-B binary mode (used with -o below) causes output rands to be written in raw binary, not formatted ascii.

-D output flag - permits fields to be selected for inclusion in
dieharder output. Each flag can be entered as a binary number that turns on a specific output
field or header or by flag name; flags are aggregated. To see all currently known flags use the
-F command.

-F - lists all known flags by name and number.

-c table separator - where separator is e.g. ',' (CSV) or ' ' (whitespace).

-g generator number - selects a specific generator for testing. Using
-g -1 causes all known generators to be printed out to the display.

-h prints context-sensitive help -- usually Usage (this message) or a
test synopsis if entered as e.g. dieharder -d 3 -h.

-k ks_flag - ks_flag

0 is fast but slightly sloppy for psamples > 4999 (default).

1 is MUCH slower but more accurate for larger numbers of psamples.

2 is slower still, but (we hope) accurate to machine precision for any number of psamples up to
some as yet unknown numerical upper limit (it has been tested out to at least hundreds of
thousands).

3 is kuiper ks, fast, quite inaccurate for small samples, deprecated.

-l list all known tests.

-L overlap

1 (use overlap, default)

0 (don't use overlap)

in operm5 or other tests that support overlapping and non-overlapping sample modes.

-m multiply_p - multiply default # of psamples in -a(ll) runs to crank
up the resolution of failure. -n ntuple - set ntuple length for tests on short bit strings that
permit the length to be varied (e.g. rgb bitdist).

-o filename - output -t count random numbers from current generator to file.

-p count - sets the number of p-value samples per test (default 100).

-P Xoff - sets the number of psamples that will cumulate before deciding
that a generator is "good" and really, truly passes even a -Y 2 T2D run. Currently the default is
100000; eventually it will be set from AES-derived T2D test failure thresholds for fully automated
reliable operation, but for now it is more a "boredom" threshold set by how long one might
reasonably want to wait on any given test run.

-S seed - where seed is a uint. Overrides the default random seed
selection. Ignored for file or stdin input.

-s strategy - if strategy is the (default) 0, dieharder reseeds (or
rewinds) once at the beginning when the random number generator is selected and then never again.
If strategy is nonzero, the generator is reseeded or rewound at the beginning of EACH TEST. If -S
seed was specified, or a file is used, this means every test is applied to the same sequence
(which is useful for validation and testing of dieharder, but not a good way to test rngs).
Otherwise a new random seed is selected for each test.

-t count - sets the number of random entities used in each test, where
possible. Be warned -- some tests have fixed sample sizes; others are variable but have practical
minimum sizes. It is suggested you begin with the values used in -a and experiment carefully on a
test by test basis.

-W weak - sets the "weak" threshold to make the test(s) more or less
forgiving during e.g. a test-to-destruction run. Default is currently 0.005.

-X fail - sets the "fail" threshold to make the test(s) more or less
forgiving during e.g. a test-to-destruction run. Default is currently 0.000001, which is
basically "certain failure of the null hypothesis", the desired mode of reproducible generator
failure.

-Y Xtrategy - the Xtrategy flag controls the new "test to failure" (T2F)
modes. These flags and their modes act as follows:

0 - just run dieharder with the specified number of tsamples and psamples, do not dynamically
modify a run based on results. This is the way it has always run, and is the default.

1 - "resolve ambiguity" (RA) mode. If a test returns "weak", this is an undesired result. What
does that mean, after all? If you run a long test series, you will see occasional weak returns
for a perfect generators because p is uniformly distributed and will appear in any finite interval
from time to time. Even if a test run returns more than one weak result, you cannot be certain
that the generator is failing. RA mode adds psamples (usually in blocks of 100) until the test
result ends up solidly not weak or proceeds to unambiguous failure. This is morally equivalent to
running the test several times to see if a weak result is reproducible, but eliminates the bias of
personal judgement in the process since the default failure threshold is very small and very
unlikely to be reached by random chance even in many runs.

This option should only be used with -k 2.

2 - "test to destruction" mode. Sometimes you just want to know where or if a generator will .I
ever fail a test (or test series). -Y 2 causes psamples to be added 100 at a time until a test
returns an overall pvalue lower than the failure threshold or a specified maximum number of
psamples (see -P) is reached.

Note well! In this mode one may well fail due to the alternate null hypothesis -- the test itself
is a bad test and fails! Many dieharder tests, despite our best efforts, are numerically unstable
or have only approximately known target statistics or are straight up asymptotic results, and will
eventually return a failing result even for a gold-standard generator (such as AES), or for the
hypercautious the XOR generator with AES, threefish, kiss, all loaded at once and xor'd together.
It is therefore safest to use this mode .I comparatively, executing a T2D run on AES to get an
idea of the test failure threshold(s) (something I will eventually do and publish on the web so
everybody doesn't have to do it independently) and then running it on your target generator.
Failure with numbers of psamples within an order of magnitude of the AES thresholds should
probably be considered possible test failures, not generator failures. Failures at levels
significantly less than the known gold standard generator failure thresholds are, of course,
probably failures of the generator.

This option should only be used with -k 2.

-v verbose flag -- controls the verbosity of the output for debugging
only. Probably of little use to non-developers, and developers can read the enum(s) in
dieharder.h and the test sources to see which flag values turn on output on which routines. 1 is
result in a highly detailed trace of program activity.

-x,-y,-z number - Some tests have parameters that can safely be varied
from their default value. For example, in the diehard birthdays test, one can vary the number of
length, which can also be varied. -x 2048 -y 30 alters these two values but should still run
fine. These parameters should be documented internally (where they exist) in the e.g. -d 0 -h
visible notes.

NOTE WELL: The assessment(s) for the rngs may, in fact, be completely incorrect or misleading.
There are still "bad tests" in dieharder, although we are working to fix and improve them (and try
to document them in the test descriptions visible with -g testnumber -h). In particular, 'Weak'
pvalues should occur one test in two hundred, and 'Failed' pvalues should occur one test in a
million with the default thresholds - that's what p MEANS. Use them at your Own Risk! Be Warned!

Or better yet, use the new -Y 1 and -Y 2 resolve ambiguity or test to destruction modes above,
comparing to similar runs on one of the as-good-as-it-gets cryptographic generators, AES or
threefish.

DESCRIPTION

       dieharder

       Welcome  to  the  current snapshot of the dieharder random number tester.  It encapsulates all of the Gnu
       Scientific Library (GSL) random number generators (rngs) as well as a number of  generators  from  the  R
       statistical  library,  hardware  sources  such  as  /dev/*random,  "gold  standard" cryptographic quality
       generators (useful for testing dieharder and for purposes of comparison to new  generators)  as  well  as
       generators  contributed  by users or found in the literature into a single harness that can time them and
       subject them to various tests for randomness.  These tests are variously drawn  from  George  Marsaglia's
       "Diehard  battery  of random number tests", the NIST Statistical Test Suite, and again from other sources
       such as personal invention, user contribution, other (open source) test suites, or the literature.

       The primary point of dieharder is to make it easy to time  and  test  (pseudo)random  number  generators,
       including  both  software  and  hardware  rngs,  with a fully open source tool.  In addition to providing
       "instant" access to testing of all built-in generators, users can choose one of three ways to test  their
       own random number generators or sources:  a unix pipe of a raw binary (presumed random) bitstream; a file
       containing a (presumed random) raw binary bitstream or formatted ascii uints  or  floats;  and  embedding
       your  generator  in  dieharder's  GSL-compatible  rng  harness  and  adding  it  to  the list of built-in
       generators.  The stdin and file input methods are described below in their own section, as  is  suggested
       "best practice" for newbies to random number generator testing.

       An  important  motivation  for  using dieharder is that the entire test suite is fully Gnu Public License
       (GPL) open source code and hence rather than being prohibited from  "looking  underneath  the  hood"  all
       users  are  openly  encouraged  to  critically  examine  the  dieharder code for errors, add new tests or
       generators or user interfaces, or use it freely as is to test their own favorite candidate  rngs  subject
       only  to the constraints of the GPL.  As a result of its openness, literally hundreds of improvements and
       bug fixes have been contributed by users to date, resulting in a far  stronger  and  more  reliable  test
       suite  than  would  have  been possible with closed and locked down sources or even open sources (such as
       STS) that lack the dynamical feedback mechanism permitting corrections to be shared.

       Even small errors in test statistics permit the alternative (usually unstated) null hypothesis to  become
       an  important  factor in rng testing -- the unwelcome possibility that your generator is just fine but it
       is the test that is failing.  One extremely useful feature of dieharder is that it is at least moderately
       self  validating.   Using the "gold standard" aes and threefish cryptographic generators, you can observe
       how these generators perform on dieharder runs to the same general degree of accuracy that  you  wish  to
       use  on  the generators you are testing.  In general, dieharder tests that consistently fail at any given
       level of precision (selected with e.g. -a -m 10) on both of the gold standard rngs (and/or the better GSL
       generators,  mt19937,  gfsr4,  taus)  are  probably  unreliable  at that precision and it would hardly be
       surprising if they failed your generator as well.

       Experts in statistics are encouraged to give the suite a try, perhaps using  any  of  the  example  calls
       below  at  first  and  then  using it freely on their own generators or as a harness for adding their own
       tests.  Novices (to either statistics or random number generator testing) are strongly encouraged to read
       the  next  section on p-values and the null hypothesis and running the test suite a few times with a more
       verbose output report to learn how the whole thing works.

QUICK START EXAMPLES

       Examples for how to set up pipe or file input are given below.  However, it is recommended  that  a  user
       play  with  some  of  the built in generators to gain familiarity with dieharder reports and tests before
       tackling their own favorite generator or file full of possibly random numbers.

       To see dieharder's default standard test report for its default generator (mt19937) simply run:

          dieharder -a

       To increase the resolution of possible failures of the standard -a(ll) test, use the -m "multiplier"  for
       the  test  default numbers of pvalues (which are selected more to make a full test run take an hour or so
       instead of days than because it is truly an exhaustive test sequence) run:

          dieharder -a -m 10

       To test a different generator (say the gold standard AES_OFB) simply specify the generator on the command
       line with a flag:

          dieharder -g 205 -a -m 10

       Arguments can be in any order.  The generator can also be selected by name:

          dieharder -g AES_OFB -a

       To apply only the diehard opso test to the AES_OFB generator, specify the test by name or number:

          dieharder -g 205 -d 5

       or

          dieharder -g 205 -d diehard_opso

       Nearly  every  aspect or field in dieharder's output report format is user-selectable by means of display
       option flags.  In addition, the field separator character can be selected by the user to make the  output
       particularly easy for them to parse (-c ' ') or import into a spreadsheet (-c ',').  Try:

          dieharder -g 205 -d diehard_opso -c ',' -D test_name -D pvalues

       to see an extremely terse, easy to import report or

          dieharder -g 205 -d diehard_opso -c ' ' -D default -D histogram -D description

       to see a verbose report good for a "beginner" that includes a full description of each test itself.

       Finally,  the  dieharder  binary is remarkably autodocumenting even if the man page is not available. All
       users should try the following commands to see what they do:

          dieharder -h

       (prints the command synopsis like the one above).

          dieharder -a -h
          dieharder -d 6 -h

       (prints the test descriptions only for -a(ll) tests or for the specific test indicated).

          dieharder -l

       (lists all known tests, including how reliable rgb thinks that they are as things stand).

          dieharder -g -1

       (lists all known rngs).

          dieharder -F

       (lists all the currently known display/output control flags used with -D).

       Both beginners and experts should be aware that the assessment provided  by  dieharder  in  its  standard
       report  should  be  regarded with great suspicion.  It is entirely possible for a generator to "pass" all
       tests as far as their individual p-values are concerned and yet to fail utterly when considering them all
       together.   Similarly,  it  is  probable that a rng will at the very least show up as "weak" on 0, 1 or 2
       tests in a typical -a(ll) run, and may even "fail" 1 test one such run in 10 or so.   To  understand  why
       this is so, it is necessary to understand something of rng testing, p-values, and the null hypothesis!

P-VALUES AND THE NULL HYPOTHESIS

       dieharder  returns  "p-values".   To  understand  what a p-value is and how to use it, it is essential to
       understand the null hypothesis, H0.

       The null hypothesis for random number generator testing is "This generator is  a  perfect  random  number
       generator,  and  for  any choice of seed produces a infinitely long, unique sequence of numbers that have
       all the expected statistical properties of random numbers, to all orders".  Note well that we  know  that
       this hypothesis is technically false for all software generators as they are periodic and do not have the
       correct entropy content for this statement to ever be true.  However, many  hardware  generators  fail  a
       priori  as  well,  as  they  contain  subtle  bias  or correlations due to the deterministic physics that
       underlies them.  Nature is often unpredictable but it is rarely random and the two  words  don't  (quite)
       mean the same thing!

       The  null  hypothesis  can  be  practically  true, however.  Both software and hardware generators can be
       "random" enough that their sequences cannot be distinguished from random ones, at  least  not  easily  or
       with  the  available  tools  (including  dieharder!)  Hence  the  null  hypothesis  is a practical, not a
       theoretically pure, statement.

       To test H0 , one uses the rng in question to generate a sequence of  presumably  random  numbers.   Using
       these numbers one can generate any one of a wide range of test statistics -- empirically computed numbers
       that are considered random samples that may or may not be covariant subject to H0, depending  on  whether
       overlapping  sequences  of  random  numbers  are used to generate successive samples while generating the
       statistic(s), drawn from a known distribution.  From a  knowledge  of  the  target  distribution  of  the
       statistic(s)  and  the  associated  cumulative distribution function (CDF) and the empirical value of the
       randomly generated statistic(s), one can read off the probability of obtaining the  empirical  result  if
       the sequence was truly random, that is, if the null hypothesis is true and the generator in question is a
       "good" random number generator!  This probability is the "p-value" for the particular test run.

       For example, to test a coin (or a sequence of bits) we might simply count the number of heads  and  tails
       in  a very long string of flips.  If we assume that the coin is a "perfect coin", we expect the number of
       heads and tails to be binomially distributed and can  easily  compute  the  probability  of  getting  any
       particular number of heads and tails.  If we compare our recorded number of heads and tails from the test
       series to this distribution and find that the probability of getting the count we obtained  is  very  low
       with, say, way more heads than tails we'd suspect the coin wasn't a perfect coin.  dieharder applies this
       very test (made mathematically precise) and many others that operate on this same principle to the string
       of random bits produced by the rng being tested to provide a picture of how "random" the rng is.

       Note  that the usual dogma is that if the p-value is low -- typically less than 0.05 -- one "rejects" the
       null hypothesis.  In a word, it is improbable that one would get the result obtained if the generator  is
       a good one.  If it is any other value, one does not "accept" the generator as good, one "fails to reject"
       the generator as bad for this particular test.  A "good random number generator" is  hence  one  that  we
       haven't been able to make fail yet!

       This  criterion  is, of course, naive in the extreme and cannot be used with dieharder!  It makes just as
       much sense to reject a generator that has p-values of 0.95 or more!  Both of  these  p-value  ranges  are
       equally  unlikely on any given test run, and should be returned for (on average) 5% of all test runs by a
       perfect random number generator.  A generator that fails to produce p-values less than  0.05  5%  of  the
       time  it  is tested with different seeds is a bad random number generator, one that fails the test of the
       null hypothesis.  Since dieharder returns over 100 pvalues by default per  test,  one  would  expect  any
       perfectly  good rng to "fail" such a naive test around five times by this criterion in a single dieharder
       run!

       The p-values themselves, as it turns out, are test statistics!   By  their  nature,  p-values  should  be
       uniformly  distributed  on  the  range  0-1.  In 100+ test runs with independent seeds, one should not be
       surprised to obtain 0, 1, 2, or even (rarely) 3 p-values less than 0.01.  On the other hand  obtaining  7
       p-values  in  the range 0.24-0.25, or seeing that 70 of the p-values are greater than 0.5 should make the
       generator highly suspect!  How can a user determine when a test is producing "too many" of any particular
       value range for p?  Or too few?

       Dieharder  does  it  for you, automatically.  One can in fact convert a set of p-values into a p-value by
       comparing their distribution to the expected one, using a Kolmogorov-Smirnov test  against  the  expected
       uniform distribution of p.

       These  p-values  obtained  from  looking  at  the  distribution  of  p-values should in turn be uniformly
       distributed and could in principle be subjected to still more KS tests in aggregate.  The distribution of
       p-values  for  a  good generator should be idempotent, even across different test statistics and multiple
       runs.

       A failure of the distribution of p-values at any level of aggregation signals trouble.  In fact,  if  the
       p-values  of  any  given  test  are subjected to a KS test, and those p-values are then subjected to a KS
       test, as we add more p-values to either level  we  will  either  observe  idempotence  of  the  resulting
       distribution  of p to uniformity, or we will observe idempotence to a single p-value of zero!  That is, a
       good generator will produce a roughly uniform distribution of p-values, in the specific sense that the p-
       values  of  the  distributions of p-values are themselves roughly uniform and so on ad infinitum, while a
       bad generator will produce a non-uniform distribution of p-values, and as more p-values  drawn  from  the
       non-uniform  distribution  are  added  to  its  KS  test,  at  some  point the failure will be absolutely
       unmistakeable as the resulting p-value approaches 0 in the limit.  Trouble indeed!

       The question is, trouble with what?  Random number tests are themselves  complex  computational  objects,
       and  there  is a probability that their code is incorrectly framed or that roundoff or other numerical --
       not methodical -- errors are contributing to a distortion of the distribution of  some  of  the  p-values
       obtained.   This  is  not  an idle observation; when one works on writing random number generator testing
       programs, one is always testing the tests themselves with "good" (we hope) random  number  generators  so
       that  egregious failures of the null hypothesis signal not a bad generator but an error in the test code.
       The null hypothesis above is correctly framed from a theoretical point of  view,  but  from  a  real  and
       practical point of view it should read: "This generator is a perfect random number generator, and for any
       choice of seed produces a infinitely long,  unique  sequence  of  numbers  that  have  all  the  expected
       statistical  properties  of  random  numbers,  to  all orders and this test is a perfect test and returns
       precisely correct p-values from the test computation."  Observed "failure" of this joint null  hypothesis
       H0'  can  come  from failure of either or both of these disjoint components, and comes from the second as
       often or more often than the first  during  the  test  development  process.   When  one  cranks  up  the
       "resolution"  of the test (discussed next) to where a generator starts to fail some test one realizes, or
       should realize, that development never ends and that new test regimes will always reveal new failures not
       only of the generators but of the code.

       With  that  said,  one of dieharder's most significant advantages is the control that it gives you over a
       critical test parameter.  From the remarks above, we can see that we should feel very uncomfortable about
       "failing"  any  given  random  number generator on the basis of a 5%, or even a 1%, criterion, especially
       when we apply a test suite like dieharder that returns over 100 (and climbing) distinct test p-values  as
       of the last snapshot.  We want failure to be unambiguous and reproducible!

       To  accomplish  this,  one can simply crank up its resolution.  If we ran any given test against a random
       number generator and it returned a p-value of (say) 0.007328, we'd be perfectly justified in wondering if
       it  is  really  a  good generator.  However, the probability of getting this result isn't really all that
       small -- when one uses dieharder for hours at a time numbers  like  this  will  definitely  happen  quite
       frequently  and  mean  nothing.   If  one  runs the same test again (with a different seed or part of the
       random sequence) and gets a p-value of 0.009122, and a third time and gets 0.002669 -- well, that's three
       1%  (or  less)  shots  in  a  row and that should happen only one in a million times.  One way to clearly
       resolve failures, then, is to increase the number of p-values generated in a test  run.   If  the  actual
       distribution  of  p being returned by the test is not uniform, a KS test will eventually return a p-value
       that is not some ambiguous 0.035517 but is instead 0.000000, with the latter produced time after time  as
       we rerun.

       For  this  reason,  dieharder  is  extremely  conservative  about  announcing rng "weakness" or "failure"
       relative to any given test.  It's internal criterion for these things are currently p < 0.5% or p > 99.5%
       weakness  (at the 1% level total) and a considerably more stringent criterion for failure: p < 0.05% or p
       > 99.95%.  Note well that the ranges are symmetric -- too high a value of p is just as bad (and unlikely)
       as  too  low,  and  it  is critical to flag it, because it is quite possible for a rng to be too good, on
       average, and not to produce enough low p-values on the full spectrum of dieharder tests.  This  is  where
       the  final kstest is of paramount importance, and where the "histogram" option can be very useful to help
       you visualize the failure in the distribution of p -- run e.g.:

         dieharder [whatever] -D default -D histogram

       and you will see a crude ascii histogram of the pvalues that failed (or passed) any given level of test.

       Scattered reports of weakness or marginal failure in a preliminary -a(ll) run  should  therefore  not  be
       immediate cause for alarm.  Rather, they are tests to repeat, to watch out for, to push the rng harder on
       using the -m option to -a or simply increasing -p for a specific test.  Dieharder permits one to increase
       the  number of p-values generated for any test, subject only to the availability of enough random numbers
       (for file based tests) and time, to make failures unambiguous.  A test that is truly weak at -p 100  will
       almost  always  fail  egregiously at some larger value of psamples, be it -p 1000 or -p 100000.  However,
       because dieharder is a research tool and is under perpetual  development  and  testing,  it  is  strongly
       suggested  that  one  always consider the alternative null hypothesis -- that the failure is a failure of
       the test code in dieharder itself in some limit of large numbers -- and take at least some steps (such as
       running  the  same test at the same resolution on a "gold standard" generator) to ensure that the failure
       is indeed probably in the rng and not the dieharder code.

       Lacking a source of perfect random numbers to use as a reference, validating the tests themselves is  not
       easy  and always leaves one with some ambiguity (even aes or threefish).  During development the best one
       can usually do is to rely heavily on these "presumed good" random number generators.  There are a  number
       of  generators  that  we  have  theoretical  reasons  to  expect  to  be extraordinarily good and to lack
       correlations out to some known underlying dimensionality, and that also test  out  extremely  well  quite
       consistently.  By using several such generators and not just one, one can hope that those generators have
       (at the very least) different correlations and should not all uniformly fail a test in the same  way  and
       with  the  same  number  of  p-values.   When all of these generators consistently fail a test at a given
       level, I tend to suspect that the problem is in the test code, not the generators, although  it  is  very
       difficult to be certain, and many errors in dieharder's code have been discovered and ultimately fixed in
       just this way by myself or others.

       One advantage of dieharder is that it has a number of these "good generators" immediately  available  for
       comparison  runs,  courtesy of the Gnu Scientific Library and user contribution (notably David Bauer, who
       kindly encapsulated aes and threefish).  I use AES_OFB, Threefish_OFB, mt19937_1999, gfsr4,  ranldx2  and
       taus2  (as  well  as  "true  random"  numbers from random.org) for this purpose, and I try to ensure that
       dieharder will "pass" in particular the -g 205 -S 1 -s 1 generator at any reasonable  p-value  resolution
       out to -p 1000 or farther.

       Tests  (such  as  the  diehard operm5 and sums test) that consistently fail at these high resolutions are
       flagged as being "suspect" -- possible failures of the  alternative  null  hypothesis  --  and  they  are
       strongly deprecated!  Their results should not be used to test random number generators pending agreement
       in the statistics and random number community that those tests are in fact  valid  and  correct  so  that
       observed failures can indeed safely be attributed to a failure of the intended null hypothesis.

       As  I  keep emphasizing (for good reason!) dieharder is community supported.  I therefore openly ask that
       the users of dieharder who are expert in  statistics  to  help  me  fix  the  code  or  algorithms  being
       implemented.   I  would  like  to  see  this test suite ultimately be validated by the general statistics
       community in hard use in an open environment, where every  possible  failure  of  the  testing  mechanism
       itself  is  subject  to  scrutiny and eventual correction.  In this way we will eventually achieve a very
       powerful suite of tools indeed, ones that may well give us  very  specific  information  not  just  about
       failure but of the mode of failure as well, just how the sequence tested deviates from randomness.

       Thus  far, dieharder has benefitted tremendously from the community.  Individuals have openly contributed
       tests, new generators to be tested, and fixes for existing tests that were revealed  by  their  own  work
       with  the testing instrument.  Efforts are underway to make dieharder more portable so that it will build
       on more platforms and faster so that more thorough testing can be done.  Please feel free to participate.

FILE INPUT

       The simplest way to use dieharder with an external generator that produces raw binary  (presumed  random)
       bits  is  to  pipe  the  raw  binary output from this generator (presumed to be a binary stream of 32 bit
       unsigned integers) directly into dieharder, e.g.:

         cat /dev/urandom | ./dieharder -a -g 200

       Go ahead and try this example.  It will run the entire dieharder suite of tests on the stream produced by
       the linux built-in generator /dev/urandom (using /dev/random is not recommended as it is too slow to test
       in a reasonable amount of time).

       Alternatively, dieharder can be used to test files of numbers  produced  by  a  candidate  random  number
       generators:

         dieharder -a -g 201 -f random.org_bin

       for raw binary input or

         dieharder -a -g 202 -f random.org.txt

       for formatted ascii input.

       A formatted ascii input file can accept either uints (integers in the range 0 to 2^31-1, one per line) or
       decimal uniform deviates with at least ten significant digits (that can be multiplied by UINT_MAX =  2^32
       to  produce  a uint without dropping precition), also one per line.  Floats with fewer digits will almost
       certainly fail bitlevel tests, although they may pass some of the tests that act on uniform deviates.

       Finally, one can fairly easily wrap any generator in the same (GSL) random number harness used internally
       by  dieharder  and  simply  test  it  the  same  way one would any other internal generator recognized by
       dieharder.  This is strongly recommended where it is possible, because dieharder needs to use  a  lot  of
       random  numbers  to thoroughly test a generator.  A built in generator can simply let dieharder determine
       how many it needs and generate them on demand, where a file that is too small will  "rewind"  and  render
       the test results where a rewind occurs suspect.

       Note well that file input rands are delivered to the tests on demand, but if the test needs more than are
       available it simply rewinds the file and cycles through  it  again,  and  again,  and  again  as  needed.
       Obviously  this  significantly  reduces the sample space and can lead to completely incorrect results for
       the p-value histograms unless there are enough rands to run EACH test without repetition (it is  harmless
       to reuse the sequence for different tests).  Let the user beware!

BEST PRACTICE

A frequently asked question from new users wishing to test a generator they are working on for fun or
profit (or both) is "How should I get its output into dieharder?" This is a nontrivial question, as
dieharder consumes enormous numbers of random numbers in a full test cycle, and then there are features
like -m 10 or -m 100 that let one effortlessly demand 10 or 100 times as many to stress a new generator
even more.

Even with large file support in dieharder, it is difficult to provide enough random numbers in a file to
really make dieharder happy. It is therefore strongly suggested that you either:

a) Edit the output stage of your random number generator and get it to write its production to stdout as
a random bit stream -- basically create 32 bit unsigned random integers and write them directly to stdout
as e.g. char data or raw binary. Note that this is not the same as writing raw floating point numbers
(that will not be random at all as a bitstream) and that "endianness" of the uints should not matter for
the null hypothesis of a "good" generator, as random bytes are random in any order. Crank the generator
and feed this stream to dieharder in a pipe as described above.

b) Use the samples of GSL-wrapped dieharder rngs to similarly wrap your generator (or calls to your
generator's hardware interface). Follow the examples in the ./dieharder source directory to add it as a
"user" generator in the command line interface, rebuild, and invoke the generator as a "native" dieharder
generator (it should appear in the list produced by -g -1 when done correctly). The advantage of doing
it this way is that you can then (if your new generator is highly successful) contribute it back to the
dieharder project if you wish! Not to mention the fact that it makes testing it very easy.

Most users will probably go with option a) at least initially, but be aware that b) is probably easier
than you think. The dieharder maintainers may be able to give you a hand with it if you get into
trouble, but no promises.

WARNING!

A warning for those who are testing files of random numbers. dieharder is a tool that tests random
number generators, not files of random numbers! It is extremely inappropriate to try to "certify" a file
of random numbers as being random just because it fails to "fail" any of the dieharder tests in e.g. a
dieharder -a run. To put it bluntly, if one rejects all such files that fail any test at the 0.05 level
(or any other), the one thing one can be certain of is that the files in question are not random, as a
truly random sequence would fail any given test at the 0.05 level 5% of the time!

To put it another way, any file of numbers produced by a generator that "fails to fail" the dieharder
suite should be considered "random", even if it contains sequences that might well "fail" any given test
at some specific cutoff. One has to presume that passing the broader tests of the generator itself, it
was determined that the p-values for the test involved was globally correctly distributed, so that e.g.
failure at the 0.01 level occurs neither more nor less than 1% of the time, on average, over many many
tests. If one particular file generates a failure at this level, one can therefore safely presume that
it is a random file pulled from many thousands of similar files the generator might create that have the
correct distribution of p-values at all levels of testing and aggregation.

To sum up, use dieharder to validate your generator (via input from files or an embedded stream). Then
by all means use your generator to produce files or streams of random numbers. Do not use dieharder as
an accept/reject tool to validate the files themselves!

EXAMPLES

       To demonstrate all tests, run on the default GSL rng, enter:

         dieharder -a

       To demonstrate a test of an external generator of a raw binary  stream  of  bits,  use  the  stdin  (raw)
       interface:

         cat /dev/urandom | dieharder -g 200 -a

       To use it with an ascii formatted file:

         dieharder -g 202 -f testrands.txt -a

       (testrands.txt should consist of a header such as:

        #==================================================================
        # generator mt19937_1999  seed = 1274511046
        #==================================================================
        type: d
        count: 100000
        numbit: 32
        3129711816
          85411969
        2545911541

       etc.).

       To use it with a binary file

         dieharder -g 201 -f testrands.bin -a

       or

         cat testrands.bin | dieharder -g 200 -a

       An  example  that  demonstrates the use of "prefixes" on the output lines that make it relatively easy to
       filter off the different parts of the output report and chop them up into numbers that  can  be  used  in
       other programs or in spreadsheets, try:

         dieharder -a -c ',' -D default -D prefix

DISPLAY OPTIONS

As of version 3.x.x, dieharder has a single output interface that produces tabular data per test, with
common information in headers. The display control options and flags can be used to customize the output
to your individual specific needs.

The options are controlled by binary flags. The flags, and their text versions, are displayed if you
enter:

dieharder -F

by itself on a line.

The flags can be entered all at once by adding up all the desired option flags. For example, a very
sparse output could be selected by adding the flags for the test_name (8) and the associated pvalues
(128) to get 136:

dieharder -a -D 136

Since the flags are cumulated from zero (unless no flag is entered and the default is used) you could
accomplish the same display via:

dieharder -a -D 8 -D pvalues

Note that you can enter flags by value or by name, in any combination. Because people use dieharder to
obtain values and then with to export them into spreadsheets (comma separated values) or into filter
scripts, you can chance the field separator character. For example:

dieharder -a -c ',' -D default -D -1 -D -2

produces output that is ideal for importing into a spreadsheet (note that one can subtract field values
from the base set of fields provided by the default option as long as it is given first).

An interesting option is the -D prefix flag, which turns on a field identifier prefix to make it easy to
filter out particular kinds of data. However, it is equally easy to turn on any particular kind of
output to the exclusion of others directly by means of the flags.

Two other flags of interest to novices to random number generator testing are the -D histogram (turns on
a histogram of the underlying pvalues, per test) and -D description (turns on a complete test
description, per test). These flags turn the output table into more of a series of "reports" of each
test.

PUBLICATION RULES

dieharder is entirely original code and can be modified and used at will by any user, provided that:

a) The original copyright notices are maintained and that the source, including all modifications, is
made publically available at the time of any derived publication. This is open source software according
to the precepts and spirit of the Gnu Public License. See the accompanying file COPYING, which also must
accompany any redistribution.

b) The primary author of the code (Robert G. Brown) is appropriately acknowledged and referenced in any
derived publication. It is strongly suggested that George Marsaglia and the Diehard suite and the
various authors of the Statistical Test Suite be similarly acknowledged, although this suite shares no
actual code with these random number test suites.

c) Full responsibility for the accuracy, suitability, and effectiveness of the program rests with the
users and/or modifiers. As is clearly stated in the accompanying copyright.h:

THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

ACKNOWLEDGEMENTS

       The author of this suite gratefully acknowledges George Marsaglia (the author of the diehard test  suite)
       and  the  various  authors of NIST Special Publication 800-22 (which describes the Statistical Test Suite
       for testing pseudorandom number generators for cryptographic applications), for excellent descriptions of
       the tests therein.  These descriptions enabled this suite to be developed with a GPL.

       The  author  also wishes to reiterate that the academic correctness and accuracy of the implementation of
       these tests is his sole responsibility and not that of the authors of the Diehard or STS suites.  This is
       especially true where he has seen fit to modify those tests from their strict original descriptions.

COPYRIGHT

       GPL  2b;  see  the  file  COPYING that accompanies the source of this program.  This is the "standard Gnu
       General Public License version 2 or  any  later  version",  with  the  one  minor  (humorous)  "Beverage"
       modification  listed  below.   Note  that this modification is probably not legally defensible and can be
       followed really pretty much according to the honor rule.

       As to my personal preferences in beverages, red wine is great, beer  is  delightful,  and  Coca  Cola  or
       coffee  or  tea  or  even  milk  acceptable  to those who for religious or personal reasons wish to avoid
       stressing my liver.

       The Beverage Modification to the GPL:

       Any satisfied user of this software shall, upon meeting the primary author(s) of this  software  for  the
       first  time  under  the  appropriate  circumstances,  offer  to  buy him or her or them a beverage.  This
       beverage may or may not be alcoholic, depending on the personal ethical and moral views of  the  offerer.
       The  beverage  cost  need  not  exceed  one  U.S.  dollar  (although  it certainly may at the whim of the
       offerer:-) and may be accepted or declined with no further obligation on the part of the offerer.  It  is
       not necessary to repeat the offer after the first meeting, but it can't hurt...