Provided by: mlpack-bin_2.2.5-1build1_amd64 bug

NAME

       mlpack_nca - neighborhood components analysis (nca)

SYNOPSIS

        mlpack_nca [-h] [-v]

DESCRIPTION

       This  program  implements  Neighborhood  Components  Analysis,  both  a  linear  dimensionality reduction
       technique  and  a  distance  learning  technique.  The  method  seeks   to   improve   k-nearest-neighbor
       classification  on a dataset by scaling the dimensions. The method is nonparametric, and does not require
       a value of k. It  works  by  using  stochastic  ("soft")  neighbor  assignments  and  using  optimization
       techniques over the gradient of the accuracy of the neighbor assignments.

       To  work,  this  algorithm  needs  labeled  data.  It  can  be given as the last row of the input dataset
       (--input_file), or alternatively in a separate file (--labels_file).

       This implementation of NCA uses stochastic gradient descent, mini-batch stochastic gradient  descent,  or
       the  L_BFGS  optimizer.  These  optimizers  do not guarantee global convergence for a nonconvex objective
       function (NCA's objective function is nonconvex), so the final results could depend on the random seed or
       other optimizer parameters.

       Stochastic gradient descent, specified by --optimizer "sgd", depends primarily  on  two  parameters:  the
       step  size  (--step_size)  and  the  maximum  number  of  iterations  (--max_iterations).  In addition, a
       normalized starting point can be used (--normalize), which is necessary if  many  warnings  of  the  form
       ’Denominator  of p_i is 0!' are given. Tuning the step size can be a tedious affair. In general, the step
       size is too large if the objective is not mostly uniformly  decreasing,  or  if  zero-valued  denominator
       warnings  are being issued.  The step size is too small if the objective is changing very slowly. Setting
       the termination condition can be done easily once a good step size parameter is  found;  either  increase
       the  maximum  iterations to a large number and allow SGD to find a minimum, or set the maximum iterations
       to 0 (allowing infinite iterations) and set the tolerance (--tolerance) to  define  the  maximum  allowed
       difference  between  objectives  for  SGD to terminate. Be careful---setting the tolerance instead of the
       maximum iterations can take a very long time and may actually never converge due to the properties of the
       SGD optimizer. Note that a single iteration of SGD refers to a single point, so to  take  a  single  pass
       over the dataset, set --max_iterations equal to the number of points in the dataset.

       The  mini-batch  SGD optimizer, specified by --optimizer "minibatch-sgd", has the same parameters as SGD,
       but the batch size may also be specified with the --batch_size (-b) option. Each iteration of  mini-batch
       SGD refers to a single mini-batch.

       The  L-BFGS  optimizer,  specified  by --optimizer "lbfgs", uses a back-tracking line search algorithm to
       minimize a function. The following parameters are used by L-BFGS: --num_basis (specifies  the  number  of
       memory   points   used   by  L-BFGS),  --max_iterations,  --armijo_constant,  --wolfe,  --tolerance  (the
       optimization is terminated when  the  gradient  norm  is  below  this  value),  --max_line_search_trials,
       --min_step  and  --max_step (which both refer to the line search routine). For more details on the L-BFGS
       optimizer, consult either the mlpack L-BFGS documentation (in lbfgs.hpp) or the  vast  set  of  published
       literature on L-BFGS.

       By default, the SGD optimizer is used.

REQUIRED INPUT OPTIONS

       --input_file (-i) [string]
              Input dataset to run NCA on.

OPTIONAL INPUT OPTIONS

       --armijo_constant (-A) [double] Armijo constant for L-BFGS. Default value 0.0001.

       --batch_size (-b) [int]
              Batch size for mini-batch SGD. Default value

              50.

       --help (-h)
              Default help info.

       --info [string]
              Get help on a specific module or option.  Default value ''.

       --labels_file (-l) [string]
              File of labels for input dataset. Default value ’'.

       --linear_scan (-L)
              Don't shuffle the order in which data points are visited for SGD or mini-batch SGD.

       --max_iterations (-n) [int]
              Maximum  number  of  iterations  for  SGD  or L-BFGS (0 indicates no limit). Default value 500000.
              --max_line_search_trials (-T) [int] Maximum number of line  search  trials  for  L-BFGS.   Default
              value 50.

       --max_step (-M) [double]
              Maximum step of line search for L-BFGS. Default value 1e+20.

       --min_step (-m) [double]
              Minimum step of line search for L-BFGS. Default value 1e-20.

       --normalize (-N)
              Use a normalized starting point for optimization. This is useful for when points are far apart, or
              when SGD is returning NaN.

       --num_basis (-B) [int]
              Number of memory points to be stored for L-BFGS.  Default value 5.

       --optimizer (-O) [string]
              Optimizer to use; 'sgd', 'minibatch-sgd', or ’lbfgs'. Default value 'sgd'.

       --seed (-s) [int]
              Random seed. If 0, 'std::time(NULL)' is used.  Default value 0.

       --step_size (-a) [double]
              Step size for stochastic gradient descent (alpha). Default value 0.01.

       --tolerance (-t) [double]
              Maximum tolerance for termination of SGD or L-BFGS. Default value 1e-07.

       --verbose (-v)
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V)
              Display the version of mlpack.

       --wolfe (-w) [double]
              Wolfe condition parameter for L-BFGS. Default value 0.9.

OPTIONAL OUTPUT OPTIONS

       --output_file (-o) [string]
              Output file for learned distance matrix.  Default value ''.

ADDITIONAL INFORMATION

ADDITIONAL INFORMATION

       For  further  information,  including  relevant  papers,  citations, and theory, For further information,
       including   relevant   papers,   citations,   and   theory,   consult   the   documentation   found    at
       http://www.mlpack.org  or  included with your consult the documentation found at http://www.mlpack.org or
       included with your DISTRIBUTION OF MLPACK.  DISTRIBUTION OF MLPACK.

                                                                                    mlpack_nca(16 November 2017)