Ubuntu Manpage: Statistics::Normality - test whether an empirical distribution can be taken as being drawn

Provided by: libstatistics-normality-perl_0.01-1_all

NAME

       Statistics::Normality - test whether an empirical distribution can be taken as being drawn
       from a normally-distributed population

VERSION

       Version 0.01

SYNOPSIS

           use Statistics::Normality ':all';
           use Statistics::Normality 'shapiro_wilk_test';
           use Statistics::Normality 'dagostino_k_square_test';

DESCRIPTION

       Various situations call for testing whether an empirical sample can be presumed to have
       been drawn from a normally (Gaussian <http://en.wikipedia.org/wiki/Normal_distribution>)
       distributed population, especially because many downstream significance tests depend upon
       the assumption of normality.  This package implements some of the more well-known tests
       <http://en.wikipedia.org/wiki/Normality_test> from the mathematical statistics literature,
       though there are also others that are not included.  The tests here are all so-called
       omnibus tests that find departures from normality on the basis of skewness and/or kurtosis
       [Dagostino71].  Note that, although the Kolmogorov-Smirnov test
       <http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test> can also be used in this
       capacity, it is a distance test and therefore not advisable [Dagostino71].  This, and
       other distance tests (e.g. Chi-square) are not implemented here.

TESTS

       The subtleties and esoterica of various statistical tests for normality require some
       familiarity with the mathematical statistics literature.  We give rules-of-thumb for
       specific tests, where they exist, but it may be advisable to try several different tests
       to check the consistency of the conclusion.  It is probably also a good idea to check
       results graphically, either by direct plotting or by a Q-Q plot
       <http://en.wikipedia.org/wiki/Q-Q_plot>.  In general, small samples will often pass a
       normality test suggesting the possibility that there is insufficient information to detect
       departure from normal for such cases, should it exist.

       Each of the methods here is a frequentist test, i.e. one that tests against the null-
       hypothesis <http://en.wikipedia.org/wiki/Null_hypothesis> that the sample is normal.  In
       other words, a low p-value recommends rejecting the null.

EXPORT

       A list of functions that can be exported.  You can delete this section if you don't export
       anything, such as for a purely object-oriented module.

   Shapiro-Wilk Test
       The Shapiro-Wilk W-Statistic test <http://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test>
       [Shapiro65] is considered to be among the most objective tests of normality [Royston92]
       and also one of the most powerful ones for detecting non-normality [Chen71].  Its
       statistic is essentially the roughly best unbiased estimator of population standard
       deviation to the sample variance [Dagostino71].  The test is mathematically complex and
       most implementations use several conventional approximations (as we do here), including
       Blom's formula for the expected value of the order statistics [Harter61] and
       transformation to standard normal distribution for evaluation, especially for large
       samples [Royston92].

               $pval = shapiro_wilk_test ([0.34, -0.2, 0.8, ...]);
               ($pval, $w_statistic) = shapiro_wilk_test ([0.34, -0.2, 0.8, ...]);

       This test may not be the best if there are many repeated values in the test distribution
       or when the number of points in the test distribution is very large, e.g. more than 5000.
       The routine will carp about the latter, but not the former.  This particular
       implementation of the test also requires at least 6 data points in the sample distribution
       and will croak otherwise.

   D'Agostino K-Squared Test
       The D'Agostino K-Squared test <http://en.wikipedia.org/wiki/D%27Agostino%27s_K-
       squared_test> is a good test against non-normality arising from kurtosis
       <http://en.wikipedia.org/wiki/Kurtosis> and/or skewness
       <http://en.wikipedia.org/wiki/Skewness> [Dagostino90].

               $pval = dagostino_k_square_test ([0.34, -0.2, ...]);
               ($pval, $ksq_statistic) = dagostino_k_square_test ([0.34, -0.2, ...]);

       The test statistic depends upon both the sample kurtosis and skewness, as well as the
       moments of these parameters from a normal population, as quantified by Pearson's
       coefficients [Pearson31].  These are transformed [Dagostino70,Anscombe83] to expressions
       that sum to the K-squared statistic, which is essentially chi-square-distributed with 2
       degrees of freedom [Dagostino90].  The kurtosis transform, and thus the overall test,
       generally works best when the sample distribution has at least 20 data points [Anscombe83]
       and the routine will carp otherwise.

REFERENCES

       •   [Anscombe83] Anscombe, F. J. and Glynn, W. J. (1983) Distribution of the Kurtosis
           Statistic B2 for Normal Samples, Biometrika 70(1), 227-234.

       •   [Chen71] Chen, E. H. (1971) The Power of the Shapiro-Wilk W Test for Normality in
           Samples from Contaminated Normal Distributions, Journal of the American Statistical
           Association 66(336), 760-762.

       •   [Dagostino70] D'Agostino, R. B. (1970) Transformation to Normality of the Null
           Distribution of G1, Biometrika 57(3), 679-681.

       •   [Dagostino71] D'Agostino, R. B. (1971) An Omnibus Test of Normality for Moderate and
           Large Size Samples, Biometrika 58(2), 341-348.

       •   [Dagostino90] D'Agostino, R. B. et al. (1990) A Suggestion for Using Powerful and
           Informative Tests of Normality, American Statistician 44(4), 316-321.

       •   [Harter61] Harter, H. L. (1961) Expected values of normal order statistics, Biometrika
           48(1/2), 151-165.

       •   [Pearson31] Pearson, E. S. (1931) Notes on Tests for Normality, Biometrika 22(3/4),
           423-424.

       •   [Royston92] Royston, J. P. (1992) Approximating the Shapiro-Wilk W-test for non-
           normality, Statistics and Computing 2(3) 117-119.

       •   [Shapiro65] Shapiro, S. S. and Wilk, M. B. (1965) An analysis of variance test for
           normality - complete samp1es, Biometrika 52(3/4), 591-611.

AUTHOR

       Mike Wendl, "<mwendl at genome.wustl.edu>"

BUGS

       Please report any bugs or feature requests to "bug-statistics-normality at rt.cpan.org",
       or through the web interface at
       <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-Normality>.  I will be
       notified, and then you'll automatically be notified of progress on your bug as I make
       changes.

SUPPORT

       You can find documentation for this module with the perldoc command.

           perldoc Statistics::Normality

       You can also look for information at:

       •   RT: CPAN's request tracker

           <http://rt.cpan.org/NoAuth/Bugs.html?Dist=Statistics-Normality>

       •   AnnoCPAN: Annotated CPAN documentation

           <http://annocpan.org/dist/Statistics-Normality>

       •   CPAN Ratings

           <http://cpanratings.perl.org/d/Statistics-Normality>

       •   Search CPAN

           <http://search.cpan.org/dist/Statistics-Normality/>

COPYRIGHT & LICENSE

       Copyright (C) 2011 Washington University

       This program is free software; you can redistribute it and/or modify it under the terms of
       the GNU General Public License as published by the Free Software Foundation; either
       version 2 of the License, or (at your option) any later version.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
       without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
       See the GNU General Public License for more details.

       You should have received a copy of the GNU General Public License along with this program;
       if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
       MA 02111-1307, USA.