xenial (1) mlpack_nca.1.gz

Provided by: mlpack-bin_2.0.1-1_amd64 bug

NAME

       mlpack_nca - neighborhood components analysis (nca)

SYNOPSIS

        mlpack_nca [-h] [-v] -i string -o string [-A double] [-l string] [-L] [-n int] [-T int] [-M double] [-m double] [-N] [-B int] [-O string] [-s int] [-a double] [-t double] [-V] [-w double]

DESCRIPTION

       This  program  implements  Neighborhood  Components  Analysis,  both  a  linear  dimensionality reduction
       technique  and  a  distance  learning  technique.  The  method  seeks   to   improve   k-nearest-neighbor
       classification  on a dataset by scaling the dimensions. The method is nonparametric, and does not require
       a value of k. It  works  by  using  stochastic  ("soft")  neighbor  assignments  and  using  optimization
       techniques over the gradient of the accuracy of the neighbor assignments.

       To  work,  this  algorithm  needs  labeled  data.  It  can  be given as the last row of the input dataset
       (--input_file), or alternatively in a separate file (--labels_file).

       This implementation of NCA uses either stochastic gradient descent or the L_BFGS optimizer. Both of these
       optimizers  do  not  guarantee  global  convergence  for  a nonconvex objective function (NCA's objective
       function is nonconvex), so the final  results  could  depend  on  the  random  seed  or  other  optimizer
       parameters.

       Stochastic  gradient  descent,  specified  by --optimizer "sgd", depends primarily on two parameters: the
       step size (--step_size)  and  the  maximum  number  of  iterations  (--max_iterations).  In  addition,  a
       normalized  starting  point  can  be  used (--normalize), which is necessary if many warnings of the form
       ’Denominator of p_i is 0!' are given. Tuning the step size can be a tedious affair. In general, the  step
       size  is  too  large  if  the objective is not mostly uniformly decreasing, or if zero-valued denominator
       warnings are being issued.  The step size is too small if the objective is changing very slowly.  Setting
       the  termination  condition  can be done easily once a good step size parameter is found; either increase
       the maximum iterations to a large number and allow SGD to find a minimum, or set the  maximum  iterations
       to  0  (allowing  infinite  iterations) and set the tolerance (--tolerance) to define the maximum allowed
       difference between objectives for SGD to terminate. Be careful -- setting the tolerance  instead  of  the
       maximum iterations can take a very long time and may actually never converge due to the properties of the
       SGD optimizer.

       The L-BFGS optimizer, specified by --optimizer "lbfgs", uses a back-tracking  line  search  algorithm  to
       minimize  a  function.  The following parameters are used by L-BFGS: --num_basis (specifies the number of
       memory  points  used  by  L-BFGS),  --max_iterations,  --armijo_constant,   --wolfe,   --tolerance   (the
       optimization  is  terminated  when  the  gradient  norm  is  below this value), --max_line_search_trials,
       --min_step and --max_step (which both refer to the line search routine). For more details on  the  L-BFGS
       optimizer,  consult  either  the  mlpack L-BFGS documentation (in lbfgs.hpp) or the vast set of published
       literature on L-BFGS.

       By default, the SGD optimizer is used.

REQUIRED OPTIONS

       --input_file (-i) [string]
              Input dataset to run NCA on.

       --output_file (-o) [string]
              Output file for learned distance matrix.

OPTIONS

       --armijo_constant (-A) [double]
              Armijo constant for L-BFGS. Default value 0.0001.

       --help (-h)
              Default help info.

       --info [string]
              Get help on a specific module or option.  Default value ''.

       --labels_file (-l) [string]
              File of labels for input dataset. Default value ''.

       --linear_scan (-L)
              Don't shuffle the order in which data points are visited for SGD.

       --max_iterations (-n) [int]
              Maximum number of iterations for SGD or L-BFGS (0 indicates no limit). Default value 500000.

       --max_line_search_trials (-T) [int]
              Maximum number of line search trials for L-BFGS. Default value 50.

       --max_step (-M) [double]
              Maximum step of line search for L-BFGS. Default value 1e+20.

       --min_step (-m) [double]
              Minimum step of line search for L-BFGS. Default value 1e-20.

       --normalize (-N)
              Use a normalized starting point for optimization. This is useful for when points are far apart, or
              when SGD is returning NaN.

       --num_basis (-B) [int]
              Number of memory points to be stored for L-BFGS. Default value 5.

       --optimizer (-O) [string]
              Optimizer to use; "sgd" or "lbfgs". Default value 'sgd'.

       --seed (-s) [int]
              Random seed. If 0, 'std::time(NULL)' is used.  Default value 0.

       --step_size (-a) [double]
              Step size for stochastic gradient descent (alpha). Default value 0.01.

       --tolerance (-t) [double]
              Maximum tolerance for termination of SGD or L-BFGS. Default value 1e-07.

       --verbose (-v)
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V)
              Display the version of mlpack.

       --wolfe (-w) [double]
              Wolfe condition parameter for L-BFGS. Default value 0.9.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant  papers, citations, and theory, consult the documentation
       found at http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK.

                                                                                                   mlpack_nca(1)