Provided by: mlpack-bin_2.0.1-1_amd64 bug

NAME

       mlpack_nca - neighborhood components analysis (nca)

SYNOPSIS

        mlpack_nca [-h] [-v] -i string -o string [-A double] [-l string] [-L] [-n int] [-T int] [-M double] [-m double] [-N] [-B int] [-O string] [-s int] [-a double] [-t double] [-V] [-w double]

DESCRIPTION

       This  program  implements  Neighborhood  Components Analysis, both a linear dimensionality
       reduction technique and a distance learning technique. The  method  seeks  to  improve  k-
       nearest-neighbor  classification  on  a  dataset  by scaling the dimensions. The method is
       nonparametric, and does not require a value of k. It works by  using  stochastic  ("soft")
       neighbor  assignments  and using optimization techniques over the gradient of the accuracy
       of the neighbor assignments.

       To work, this algorithm needs labeled data. It can be given as the last row of  the  input
       dataset (--input_file), or alternatively in a separate file (--labels_file).

       This  implementation  of  NCA  uses  either  stochastic  gradient  descent  or  the L_BFGS
       optimizer. Both of these optimizers do not guarantee global convergence  for  a  nonconvex
       objective  function  (NCA's  objective  function is nonconvex), so the final results could
       depend on the random seed or other optimizer parameters.

       Stochastic gradient descent, specified by --optimizer  "sgd",  depends  primarily  on  two
       parameters:   the   step   size   (--step_size)  and  the  maximum  number  of  iterations
       (--max_iterations). In addition, a normalized starting point can  be  used  (--normalize),
       which  is  necessary  if  many  warnings of the form ’Denominator of p_i is 0!' are given.
       Tuning the step size can be a tedious affair. In general, the step size is  too  large  if
       the  objective  is not mostly uniformly decreasing, or if zero-valued denominator warnings
       are being issued.  The step size is too small if the objective is  changing  very  slowly.
       Setting  the  termination  condition can be done easily once a good step size parameter is
       found; either increase the maximum iterations to a large number and allow SGD  to  find  a
       minimum,  or  set  the  maximum iterations to 0 (allowing infinite iterations) and set the
       tolerance (--tolerance) to define the maximum allowed difference  between  objectives  for
       SGD  to  terminate.  Be careful -- setting the tolerance instead of the maximum iterations
       can take a very long time and may actually never converge due to the properties of the SGD
       optimizer.

       The  L-BFGS  optimizer, specified by --optimizer "lbfgs", uses a back-tracking line search
       algorithm to minimize a function. The following parameters are used by L-BFGS: --num_basis
       (specifies   the   number   of   memory   points   used   by   L-BFGS),  --max_iterations,
       --armijo_constant, --wolfe, --tolerance (the optimization is terminated when the  gradient
       norm is below this value), --max_line_search_trials, --min_step and --max_step (which both
       refer to the line search routine). For more  details  on  the  L-BFGS  optimizer,  consult
       either  the  mlpack  L-BFGS  documentation  (in  lbfgs.hpp)  or  the vast set of published
       literature on L-BFGS.

       By default, the SGD optimizer is used.

REQUIRED OPTIONS

       --input_file (-i) [string]
              Input dataset to run NCA on.

       --output_file (-o) [string]
              Output file for learned distance matrix.

OPTIONS

       --armijo_constant (-A) [double]
              Armijo constant for L-BFGS. Default value 0.0001.

       --help (-h)
              Default help info.

       --info [string]
              Get help on a specific module or option.  Default value ''.

       --labels_file (-l) [string]
              File of labels for input dataset. Default value ''.

       --linear_scan (-L)
              Don't shuffle the order in which data points are visited for SGD.

       --max_iterations (-n) [int]
              Maximum number of iterations for SGD or L-BFGS  (0  indicates  no  limit).  Default
              value 500000.

       --max_line_search_trials (-T) [int]
              Maximum number of line search trials for L-BFGS. Default value 50.

       --max_step (-M) [double]
              Maximum step of line search for L-BFGS. Default value 1e+20.

       --min_step (-m) [double]
              Minimum step of line search for L-BFGS. Default value 1e-20.

       --normalize (-N)
              Use  a  normalized  starting point for optimization. This is useful for when points
              are far apart, or when SGD is returning NaN.

       --num_basis (-B) [int]
              Number of memory points to be stored for L-BFGS. Default value 5.

       --optimizer (-O) [string]
              Optimizer to use; "sgd" or "lbfgs". Default value 'sgd'.

       --seed (-s) [int]
              Random seed. If 0, 'std::time(NULL)' is used.  Default value 0.

       --step_size (-a) [double]
              Step size for stochastic gradient descent (alpha). Default value 0.01.

       --tolerance (-t) [double]
              Maximum tolerance for termination of SGD or L-BFGS. Default value 1e-07.

       --verbose (-v)
              Display informational messages and the full list of parameters and  timers  at  the
              end of execution.

       --version (-V)
              Display the version of mlpack.

       --wolfe (-w) [double]
              Wolfe condition parameter for L-BFGS. Default value 0.9.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant papers, citations, and theory, consult the
       documentation found at http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK.

                                                                                    mlpack_nca(1)