Ubuntu Manpage: Math::GSL::Statistics

Provided by: libmath-gsl-perl_0.39-1build2_amd64

NAME

       Math::GSL::Statistics - Statistical functions

SYNOPSIS

           use Math::GSL::Statistics qw /:all/;

           my $data     = [17.2, 18.1, 16.5, 18.3, 12.6];
           my $mean     = gsl_stats_mean($data, 1, 5);
           my $variance = gsl_stats_variance($data, 1, 5);
           my $largest  = gsl_stats_max($data, 1, 5);
           my $smallest = gsl_stats_min($data, 1, 5);
           print qq{
           Dataset : @$data
           Sample mean           $mean
           Estimated variance    $variance
           Largest value         $largest
           Smallest value        $smallest
           };

DESCRIPTION

       Here is a list of all the functions in this module :

       • "gsl_stats_mean($data, $stride, $n)" - This function returns the arithmetic mean of the array reference
         $data,  a  dataset of length $n with stride $stride. The arithmetic mean, or sample mean, is denoted by
         \Hat\mu and defined as, \Hat\mu = (1/N) \sum x_i where x_i are the elements of the dataset  $data.  For
         samples drawn from a gaussian distribution the variance of \Hat\mu is \sigma^2 / N.

       • "gsl_stats_variance($data,  $stride, $n)" - This function returns the estimated, or sample, variance of
         data, an array reference of length $n with  stride  $stride.  The  estimated  variance  is  denoted  by
         \Hat\sigma^2  and  is  defined  by,  \Hat\sigma^2  = (1/(N-1)) \sum (x_i - \Hat\mu)^2 where x_i are the
         elements of the dataset data. Note that the normalization factor of 1/(N-1) results from the derivation
         of \Hat\sigma^2 as an unbiased estimator of the population variance \sigma^2. For samples drawn from  a
         gaussian distribution the variance of \Hat\sigma^2 itself is 2 \sigma^4 / N. This function computes the
         mean  via a call to gsl_stats_mean. If you have already computed the mean then you can pass it directly
         to gsl_stats_variance_m.

       • "gsl_stats_sd($data, $stride, $n)"

       • "gsl_stats_sd_m($data, $stride, $n, $mean)"

         The standard deviation is defined as the square root of the variance. These functions return the square
         root of the corresponding variance functions above.

       • "gsl_stats_variance_with_fixed_mean($data, $stride, $n, $mean)" - This function calculates the standard
         deviation of the array reference $data for a fixed population mean $mean. The result is the square root
         of the corresponding variance function.

       • "gsl_stats_sd_with_fixed_mean($data, $stride, $n, $mean)" - This function computes an unbiased estimate
         of the variance of data when the population mean $mean  of  the  underlying  distribution  is  known  a
         priori.  In this case the estimator for the variance uses the factor 1/N and the sample mean \Hat\mu is
         replaced by the known population mean \mu, \Hat\sigma^2 = (1/N) \sum (x_i - \mu)^2

       • "gsl_stats_tss($data, $stride, $n)"

       • "gsl_stats_tss_m($data, $stride, $n, $mean)"

         These functions return the total sum of squares (TSS) of data about the mean. For  gsl_stats_tss_m  the
         user-supplied  value  of mean is used, and for gsl_stats_tss it is computed using gsl_stats_mean. TSS =
         \sum (x_i - mean)^2

       • "gsl_stats_absdev($data, $stride, $n)" - This function computes the absolute deviation from the mean of
         data, a dataset of length $n with stride $stride. The absolute deviation from the mean is  defined  as,
         absdev   =  (1/N)  \sum  |x_i  -  \Hat\mu| where x_i are the elements of the array reference $data. The
         absolute deviation from the mean provides a more robust measure of the width of a distribution than the
         variance. This function computes the mean of data via a call to gsl_stats_mean.

       • "gsl_stats_skew($data, $stride, $n)" - This function computes the skewness of $data, a dataset  in  the
         form  of  an array reference of length $n with stride $stride. The skewness is defined as, skew = (1/N)
         \sum ((x_i - \Hat\mu)/\Hat\sigma)^3 where x_i are the elements  of  the  dataset  $data.  The  skewness
         measures  the  asymmetry  of  the tails of a distribution. The function computes the mean and estimated
         standard deviation of data via calls to gsl_stats_mean and gsl_stats_sd.

       • "gsl_stats_skew_m_sd($data, $stride, $n, $mean, $sd)" - This function  computes  the  skewness  of  the
         array reference $data using the given values of the mean $mean and standard deviation $sd, skew = (1/N)
         \sum ((x_i - mean)/sd)^3. These functions are useful if you have already computed the mean and standard
         deviation of $data and want to avoid recomputing them.

       • "gsl_stats_kurtosis($data,  $stride,  $n)"  -  This  function  computes  the kurtosis of data, an array
         reference of length $n with stride $stride. The kurtosis is defined as, kurtosis = ((1/N) \sum ((x_i  -
         \Hat\mu)/\Hat\sigma)^4)   -  3. The kurtosis measures how sharply peaked a distribution is, relative to
         its width. The kurtosis is normalized to zero for a gaussian distribution.

       • "gsl_stats_kurtosis_m_sd($data, $stride, $n, $mean, $sd)" - This function computes the kurtosis of  the
         array  reference  $data using the given values of the mean $mean and standard deviation $sd, kurtosis =
         ((1/N) \sum ((x_i - mean)/sd)^4) - 3. This function is useful if you have already computed the mean and
         standard deviation of data and want to avoid recomputing them.

       • "gsl_stats_lag1_autocorrelation($data, $stride, $n)" - This function computes the lag-1 autocorrelation
         of the array reference data.
          a_1 = {\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu)
           \over
          \sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)}

       • "gsl_stats_lag1_autocorrelation_m($data, $stride, $n,  $mean)"  -  This  function  computes  the  lag-1
         autocorrelation of the array reference $data using the given value of the mean $mean.

       • "gsl_stats_covariance($data1,  $stride1, $data2, $stride2, $n)" - This function computes the covariance
         of the array reference $data1 and $data2 which must both be of the same length $n. covar = (1/(n -  1))
         \sum_{i = 1}^{n} (x_i - \Hat x) (y_i - \Hat y)

       • "gsl_stats_covariance_m($data1,  $stride1,  $data2,  $stride2,  $n,  $mean1,  $mean2)"  - This function
         computes the covariance of the array reference $data1 and $data2 using the given values of  the  means,
         $mean1  and $mean2. This is useful if you have already computed the means of $data1 and $data2 and want
         to avoid recomputing them.

       • "gsl_stats_correlation($data1, $stride1, $data2, $stride2, $n)" - This  function  efficiently  computes
         the Pearson correlation coefficient between the array reference $data1 and $data2 which must both be of
         the same length $n.
          r = cov(x, y) / (\Hat\sigma_x \Hat\sigma_y)
            = {1/(n-1) \sum (x_i - \Hat x) (y_i - \Hat y)
               \over
               \sqrt{1/(n-1) \sum (x_i - \Hat x)^2} \sqrt{1/(n-1) \sum (y_i - \Hat y)^2}
              }

       • "gsl_stats_variance_m($data, $stride, $n, $mean)" - This function returns the sample variance of $data,
         an  array  reference,  relative  to  the  given  value  of $mean. The function is computed with \Hat\mu
         replaced by the value of mean that you supply, \Hat\sigma^2 = (1/(N-1)) \sum (x_i - mean)^2

       • "gsl_stats_absdev_m($data, $stride, $n, $mean)" - This function computes the absolute deviation of  the
         dataset  $data,  an  array reference, relative to the given value of $mean, absdev  = (1/N) \sum |x_i -
         mean|. This function is useful if you have already computed  the  mean  of  data  (and  want  to  avoid
         recomputing  it),  or wish to calculate the absolute deviation relative to another value (such as zero,
         or the median).

       • "gsl_stats_wmean($w, $wstride, $data, $stride, $n)" - This function returns the weighted  mean  of  the
         dataset  $data array reference with stride $stride and length $n, using the set of weights $w, which is
         an array reference, with stride $wstride and length $n. The weighted mean  is  defined  as,  \Hat\mu  =
         (\sum w_i x_i) / (\sum w_i)

       • "gsl_stats_wvariance($w,  $wstride, $data, $stride, $n)" - This function returns the estimated variance
         of the dataset $data, which is the dataset, with stride $stride and length $n, using the set of weights
         $w (as an array reference) with stride $wstride and length $n. The estimated  variance  of  a  weighted
         dataset  is  defined  as,   \Hat\sigma^2  =  ((\sum w_i)/((\sum w_i)^2 - \sum (w_i^2))) \sum w_i (x_i -
         \Hat\mu)^2. Note that this expression reduces to an  unweighted  variance  with  the  familiar  1/(N-1)
         factor when there are N equal non-zero weights.

       • "gsl_stats_wvariance_m($w,  $wstride,  $data,  $stride,  $n, $wmean, $wsd)" - This function returns the
         estimated variance of the weighted dataset $data (which is an array reference) using the given weighted
         mean $wmean.

       • "gsl_stats_wsd($w, $wstride, $data, $stride, $n)" - The standard deviation is  defined  as  the  square
         root  of  the  variance.  This  function returns the square root of the corresponding variance function
         gsl_stats_wvariance above.

       • "gsl_stats_wsd_m($w, $wstride, $data, $stride, $n, $wmean)" - This function returns the square root  of
         the corresponding variance function gsl_stats_wvariance_m above.

       • "gsl_stats_wvariance_with_fixed_mean($w, $wstride, $data, $stride, $n, $mean)" - This function computes
         an  unbiased  estimate of the variance of weighted dataset $data (which is an array reference) when the
         population mean $mean of the underlying distribution is known a priori. In this case the estimator  for
         the  variance  replaces  the sample mean \Hat\mu by the known population mean \mu, \Hat\sigma^2 = (\sum
         w_i (x_i - \mu)^2) / (\sum w_i)

       • "gsl_stats_wsd_with_fixed_mean($w, $wstride, $data, $stride, $n, $mean)" - The  standard  deviation  is
         defined  as the square root of the variance. This function returns the square root of the corresponding
         variance function above.

       • "gsl_stats_wtss($w, $wstride, $data, $stride, $n)"

       • "gsl_stats_wtss_m($w, $wstride, $data, $stride, $n, $wmean)" -  These  functions  return  the  weighted
         total  sum  of  squares  (TSS)  of data about the weighted mean. For gsl_stats_wtss_m the user-supplied
         value of $wmean is used, and for gsl_stats_wtss it is computed using gsl_stats_wmean. TSS =   \sum  w_i
         (x_i - wmean)^2

       • "gsl_stats_wabsdev($w,  $wstride,  $data,  $stride, $n)" - This function computes the weighted absolute
         deviation from the weighted mean of $data, which is an array reference. The absolute deviation from the
         mean is defined as, absdev = (\sum w_i |x_i - \Hat\mu|) / (\sum w_i)

       • "gsl_stats_wabsdev_m($w, $wstride, $data, $stride, $n, $wmean)" - This function computes  the  absolute
         deviation of the weighted dataset $data (an array reference) about the given weighted mean $wmean.

       • "gsl_stats_wskew($w,  $wstride,  $data, $stride, $n)" - This function computes the weighted skewness of
         the dataset $data, an array reference. skew = (\sum w_i ((x_i - xbar)/\sigma)^3) / (\sum w_i)

       • "gsl_stats_wskew_m_sd($w, $wstride, $data, $stride, $n, $wmean, $wsd)" -  This  function  computes  the
         weighted  skewness  of  the  dataset  $data  using  the  given values of the weighted mean and weighted
         standard deviation, $wmean and $wsd.

       • "gsl_stats_wkurtosis($w, $wstride, $data, $stride, $n)" - This function computes the weighted  kurtosis
         of the dataset $data, an array reference. kurtosis = ((\sum w_i ((x_i - xbar)/sigma)^4) / (\sum w_i)) -
         3

       • "gsl_stats_wkurtosis_m_sd($w, $wstride, $data, $stride, $n, $wmean, $wsd)" - This function computes the
         weighted kurtosis of the dataset $data, an array reference, using the given values of the weighted mean
         and weighted standard deviation, $wmean and $wsd.

       • "gsl_stats_pvariance($data, $stride, $n, $data2, $stride2, $n2)"

       • "gsl_stats_ttest($data1, $stride1, $n1, $data2, $stride2, $n2)"

       • "gsl_stats_max($data,  $stride,  $n)"  -  This  function  returns  the maximum value in the $data array
         reference, a dataset of length $n with stride $stride. The maximum value is defined as the value of the
         element x_i which satisfies x_i >= x_j for all j. If you want instead to  find  the  element  with  the
         largest  absolute  magnitude  you  will  need  to  apply  fabs  or abs to your data before calling this
         function.

       • "gsl_stats_min($data, $stride, $n)" - This function returns the minimum value in  $data  (which  is  an
         array  reference) a dataset of length $n with stride $stride. The minimum value is defined as the value
         of the element x_i which satisfies x_i <= x_j for all j. If you want instead to find the  element  with
         the  smallest  absolute  magnitude  you will need to apply fabs or abs to your data before calling this
         function.

       • "gsl_stats_minmax($data, $stride, $n)" - This function finds both the minimum  and  maximum  values  in
         $data, which is an array reference, in a single pass and returns them in this order.

       • "gsl_stats_max_index($data,  $stride,  $n)"  -  This function returns the index of the maximum value in
         $data array reference, a dataset of length $n with stride $stride. The maximum value is defined as  the
         value  of  the  element  x_i which satisfies x_i >= x_j for all j. When there are several equal maximum
         elements then the first one is chosen.

       • "gsl_stats_min_index($data, $stride, $n)" - This function returns the index of  the  minimum  value  in
         $data  array reference, a dataset of length $n with stride $stride. The minimum value is defined as the
         value of the element x_i which satisfies x_i <= x_j for all j. When there  are  several  equal  minimum
         elements then the first one is chosen.

       • "gsl_stats_minmax_index($data,  $stride,  $n)"  -  This function returns the indexes of the minimum and
         maximum values in $data, an array reference in a single pass. The value are returned in this order.

       • "gsl_stats_median_from_sorted_data($sorted_data, $stride, $n)" - This function returns the median value
         of $sorted_data (which is an array reference), a dataset of length $n with stride $stride. The elements
         of the array must be in ascending numerical order. There are no checks to  see  whether  the  data  are
         sorted,  so  the  function  gsl_sort  should  always  be  used first. This function can be found in the
         Math::GSL::Sort module.  When the dataset has an odd number of elements the  median  is  the  value  of
         element  (n-1)/2.  When  the  dataset  has an even number of elements the median is the mean of the two
         nearest middle values, elements (n-1)/2 and n/2. Since the algorithm for computing the median  involves
         interpolation this function always returns a floating-point number, even for integer data types.

       • "gsl_stats_quantile_from_sorted_data($sorted_data, $stride, $n, $f)" - This function returns a quantile
         value  of  $sorted_data,  a  double-precision  array  reference  of  length $n with stride $stride. The
         elements of the array must be in ascending numerical order. The quantile is  determined  by  the  f,  a
         fraction  between  0  and 1. For example, to compute the value of the 75th percentile f should have the
         value 0.75. There are no checks to see whether the data are sorted, so  the  function  gsl_sort  should
         always  be  used first. This function can be found in the Math::GSL::Sort module. The quantile is found
         by interpolation, using the formula quantile = (1 - \delta) x_i + \delta x_{i+1} where i is floor((n  -
         1)f) and \delta is (n-1)f - i. Thus the minimum value of the array (data[0*stride]) is given by f equal
         to  zero,  the  maximum  value  (data[(n-1)*stride]) is given by f equal to one and the median value is
         given by f equal to 0.5. Since the  algorithm  for  computing  quantiles  involves  interpolation  this
         function always returns a floating-point number, even for integer data types.

       The following function are simply variants for int and char of the last functions:

       •   "gsl_stats_int_mean "

       •   "gsl_stats_int_variance "

       •   "gsl_stats_int_sd "

       •   "gsl_stats_int_variance_with_fixed_mean "

       •   "gsl_stats_int_sd_with_fixed_mean "

       •   "gsl_stats_int_tss "

       •   "gsl_stats_int_tss_m "

       •   "gsl_stats_int_absdev "

       •   "gsl_stats_int_skew "

       •   "gsl_stats_int_kurtosis "

       •   "gsl_stats_int_lag1_autocorrelation "

       •   "gsl_stats_int_covariance "

       •   "gsl_stats_int_correlation "

       •   "gsl_stats_int_variance_m "

       •   "gsl_stats_int_sd_m "

       •   "gsl_stats_int_absdev_m "

       •   "gsl_stats_int_skew_m_sd "

       •   "gsl_stats_int_kurtosis_m_sd "

       •   "gsl_stats_int_lag1_autocorrelation_m "

       •   "gsl_stats_int_covariance_m "

       •   "gsl_stats_int_pvariance "

       •   "gsl_stats_int_ttest "

       •   "gsl_stats_int_max "

       •   "gsl_stats_int_min "

       •   "gsl_stats_int_minmax "

       •   "gsl_stats_int_max_index "

       •   "gsl_stats_int_min_index "

       •   "gsl_stats_int_minmax_index "

       •   "gsl_stats_int_median_from_sorted_data "

       •   "gsl_stats_int_quantile_from_sorted_data "

       •   "gsl_stats_char_mean "

       •   "gsl_stats_char_variance "

       •   "gsl_stats_char_sd "

       •   "gsl_stats_char_variance_with_fixed_mean "

       •   "gsl_stats_char_sd_with_fixed_mean "

       •   "gsl_stats_char_tss "

       •   "gsl_stats_char_tss_m "

       •   "gsl_stats_char_absdev "

       •   "gsl_stats_char_skew "

       •   "gsl_stats_char_kurtosis "

       •   "gsl_stats_char_lag1_autocorrelation "

       •   "gsl_stats_char_covariance "

       •   "gsl_stats_char_correlation "

       •   "gsl_stats_char_variance_m "

       •   "gsl_stats_char_sd_m "

       •   "gsl_stats_char_absdev_m "

       •   "gsl_stats_char_skew_m_sd "

       •   "gsl_stats_char_kurtosis_m_sd "

       •   "gsl_stats_char_lag1_autocorrelation_m "

       •   "gsl_stats_char_covariance_m "

       •   "gsl_stats_char_pvariance "

       •   "gsl_stats_char_ttest "

       •   "gsl_stats_char_max "

       •   "gsl_stats_char_min "

       •   "gsl_stats_char_minmax "

       •   "gsl_stats_char_max_index "

       •   "gsl_stats_char_min_index "

       •   "gsl_stats_char_minmax_index "

       •   "gsl_stats_char_median_from_sorted_data "

       •   "gsl_stats_char_quantile_from_sorted_data "

       You  have to add the functions you want to use inside the qw /put_funtion_here /.  You can also write use
       Math::GSL::Statistics qw/:all/; to use all available functions  of  the  module.   Other  tags  are  also
       available, here is a complete list of all tags for this module :

       all
       int
       char

       For   more   information   on   the   functions,   we   refer  you  to  the  GSL  offcial  documentation:
       <http://www.gnu.org/software/gsl/manual/html_node/>

AUTHORS

       Jonathan "Duke" Leto <jonathan@leto.net> and Thierry Moisan <thierry.moisan@gmail.com>

COPYRIGHT AND LICENSE

       Copyright (C) 2008-2011 Jonathan "Duke" Leto and Thierry Moisan

       This program is free software; you can redistribute it and/or modify it under  the  same  terms  as  Perl
       itself.

perl v5.26.0                                       2017-08-06                         Math::GSL::Statistics(3pm)