xenial (1) mlpack_kernel_pca.1.gz

Provided by: mlpack-bin_2.0.1-1_amd64 bug

NAME

       mlpack_kernel_pca - kernel principal components analysis

SYNOPSIS

        mlpack_kernel_pca [-h] [-v] -i string -k string -o string [-b double] [-c] [-D double] [-S double] [-d int] [-n] [-O double] [-s string] -V

DESCRIPTION

       This  program  performs  Kernel  Principal  Components  Analysis (KPCA) on the specified dataset with the
       specified kernel. This will transform the data onto  the  kernel  principal  components,  and  optionally
       reduce the dimensionality by ignoring the kernel principal components with the smallest eigenvalues.

       For the case where a linear kernel is used, this reduces to regular PCA.

       For  example, the following will perform KPCA on the 'input.csv' file using the gaussian kernel and store
       the transformed date in the 'transformed.csv' file.

       $ kernel_pca -i input.csv -k gaussian -o transformed.csv

       The kernels that are supported are listed below:

              •  ’linear': the standard linear dot product (same as normal PCA): K(x, y) = x^T y

              •  ’gaussian': a Gaussian kernel; requires bandwidth: K(x, y) = exp(-(|| x - y ||  ^  2)  /  (2  *
                 (bandwidth ^ 2)))

              •  ’polynomial':  polynomial  kernel;  requires  offset  and  degree: K(x, y) = (x^T y + offset) ^
                 degree

              •  ’hyptan': hyperbolic tangent kernel; requires scale and offset: K(x, y) = tanh(scale * (x^T  y)
                 + offset)

              •  ’laplacian': Laplacian kernel; requires bandwidth: K(x, y) = exp(-(|| x - y ||) / bandwidth)

              •  ’epanechnikov':  Epanechnikov  kernel; requires bandwidth: K(x, y) = max(0, 1 - || x - y ||^2 /
                 bandwidth^2)

              •  ’cosine': cosine distance: K(x, y) = 1 - (x^T y) / (|| x || * || y ||)

       The parameters for each of the kernels should be specified with the options --bandwidth,  --kernel_scale,
       --offset, or --degree (or a combination of those options).

       Optionally,  the  nyström  method  ("Using the Nystroem method to speed up kernel machines", 2001) can be
       used to calculate the kernel matrix by specifying the --nystroem_method (-n) option. This approach  works
       by  using a subset of the data as basis to reconstruct the kernel matrix; to specify the sampling scheme,
       the --sampling parameter is used, the sampling scheme for the nyström  method  can  be  chosen  from  the
       following list: kmeans, random, ordered.

REQUIRED OPTIONS

       --input_file (-i) [string]
              Input dataset to perform KPCA on.

       --kernel (-k) [string]
              The kernel to use; see the above documentation for the list of usable kernels.

       --output_file (-o) [string]
              File to save modified dataset to.

OPTIONS

       --bandwidth (-b) [double]
              Bandwidth, for 'gaussian' and 'laplacian' kernels. Default value 1.

       --center (-c)
              If set, the transformed data will be centered about the origin.

       --degree (-D) [double]
              Degree of polynomial, for 'polynomial' kernel.  Default value 1.

       --help (-h)
              Default help info.

       --info [string]
              Get  help  on a specific module or option.  Default value ''.  --kernel_scale (-S) [double] Scale,
              for 'hyptan' kernel. Default value 1.

       --new_dimensionality (-d) [int]
              If not 0, reduce the dimensionality of the output dataset by  ignoring  the  dimensions  with  the
              smallest eigenvalues. Default value 0.

       --nystroem_method (-n)
              If set, the nystroem method will be used.

       --offset (-O) [double]
              Offset, for 'hyptan' and 'polynomial' kernels.  Default value 0.

       --sampling (-s) [string]
              Sampling  scheme  to  use  for  the  nystroem  method: 'kmeans', 'random', 'ordered' Default value
              'kmeans'.

       --verbose (-v)
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V)
              Display the version of mlpack.

ADDITIONAL INFORMATION

       For further information, including relevant papers, citations,  and  theory,  consult  the  documentation
       found at http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK.

                                                                                            mlpack_kernel_pca(1)