Ubuntu Manpage: mlpack_kernel_pca - kernel principal components analysis

Provided by: mlpack-bin_3.4.2-5ubuntu1_amd64

NAME

       mlpack_kernel_pca - kernel principal components analysis

SYNOPSIS

        mlpack_kernel_pca -i string -k string [-b double] [-c bool] [-D double] [-S double] [-d int] [-n bool] [-O double] [-s string] [-V bool] [-o string] [-h -v]

DESCRIPTION

       This  program  performs  Kernel  Principal  Components  Analysis (KPCA) on the specified dataset with the
       specified kernel. This will transform the data onto  the  kernel  principal  components,  and  optionally
       reduce the dimensionality by ignoring the kernel principal components with the smallest eigenvalues.

       For the case where a linear kernel is used, this reduces to regular PCA.

       The kernels that are supported are listed below:

              •  ’linear': the standard linear dot product (same as normal PCA): K(x, y) = x^T y

              •  ’gaussian':  a  Gaussian  kernel;  requires  bandwidth: K(x, y) = exp(-(|| x - y || ^ 2) / (2 *
                 (bandwidth ^ 2)))

              •  ’polynomial': polynomial kernel; requires offset and degree: K(x, y)  =  (x^T  y  +  offset)  ^
                 degree

              •  ’hyptan':  hyperbolic tangent kernel; requires scale and offset: K(x, y) = tanh(scale * (x^T y)
                 + offset)

              •  ’laplacian': Laplacian kernel; requires bandwidth: K(x, y) = exp(-(|| x - y ||) / bandwidth)

              •  ’epanechnikov': Epanechnikov kernel; requires bandwidth: K(x, y) = max(0, 1 - || x - y  ||^2  /
                 bandwidth^2)

              •  ’cosine': cosine distance: K(x, y) = 1 - (x^T y) / (|| x || * || y ||)

       The  parameters  for  each  of  the  kernels  should  be  specified  with the options ’--bandwidth (-b)',
       '--kernel_scale (-S)', '--offset (-O)', or '--degree (-D)' (or a combination of those parameters).

       Optionally, the Nystroem method ("Using the Nystroem method to speed up kernel machines",  2001)  can  be
       used  to  calculate the kernel matrix by specifying the ’--nystroem_method (-n)' parameter. This approach
       works by using a subset of the data as basis to reconstruct the kernel matrix; to  specify  the  sampling
       scheme,  the  '--sampling  (-s)'  parameter  is  used. The sampling scheme for the Nystroem method can be
       chosen from the following list: 'kmeans', 'random', ’ordered'.

       For example, the following command will perform KPCA  on  the  dataset  ’input.csv'  using  the  Gaussian
       kernel, and saving the transformed data to ’transformed.csv':

       $ mlpack_kernel_pca --input_file input.csv --kernel gaussian --output_file transformed.csv

REQUIRED INPUT OPTIONS

       --input_file (-i) [string]
              Input dataset to perform KPCA on.

       --kernel (-k) [string]
              The kernel to use; see the above documentation for the list of usable kernels.

OPTIONAL INPUT OPTIONS

       --bandwidth (-b) [double]
              Bandwidth, for 'gaussian' and 'laplacian' kernels. Default value 1.

       --center (-c) [bool]
              If set, the transformed data will be centered about the origin.

       --degree (-D) [double]
              Degree of polynomial, for 'polynomial' kernel.  Default value 1.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print  help  on  a  specific  option.  Default  value ''.  --kernel_scale (-S) [double] Scale, for
              'hyptan' kernel. Default value 1.

       --new_dimensionality (-d) [int]
              If not 0, reduce the dimensionality of the output dataset by  ignoring  the  dimensions  with  the
              smallest eigenvalues. Default value 0.

       --nystroem_method (-n) [bool]
              If set, the Nystroem method will be used.

       --offset (-O) [double]
              Offset, for 'hyptan' and 'polynomial' kernels.  Default value 0.

       --sampling (-s) [string]
              Sampling  scheme  to  use  for  the  Nystroem  method: 'kmeans', 'random', 'ordered' Default value
              'kmeans'.

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_file (-o) [string]
              Matrix to save modified dataset to.

ADDITIONAL INFORMATION

       For further information, including relevant papers, citations,  and  theory,  consult  the  documentation
       found at http://www.mlpack.org or included with your distribution of mlpack.

mlpack-3.4.2                                      11 April 2022                             mlpack_kernel_pca(1)