Ubuntu Manpage: mlpack_logistic_regression - l2-regularized logistic regression and prediction

Provided by: mlpack-bin_2.2.5-1build1_amd64

NAME

       mlpack_logistic_regression - l2-regularized logistic regression and prediction

SYNOPSIS

        mlpack_logistic_regression [-h] [-v]

DESCRIPTION

       An  implementation of L2-regularized logistic regression using either the L-BFGS optimizer
       or SGD (stochastic gradient descent). This solves the regression problem

         y = (1 / 1 + e^-(X * b))

       where y takes values 0 or 1.

       This program allows loading a logistic regression model from a file  (-i)  or  training  a
       logistic  regression  model  given  training  data  (-t), or both those things at once. In
       addition, this program allows classification on a test dataset  (-T)  and  will  save  the
       classification results to the given output file (-o). The logistic regression model itself
       may be saved with a file specified using the -m option.

       The training data given with the -t option should have class labels as its last  dimension
       (so,  if  the  training  data  is  in  CSV  format,  labels  should  be  the last column).
       Alternately, the -l (--labels_file) option may be used  to  specify  a  separate  file  of
       labels.

       When  a  model  is  being  trained,  there are many options. L2 regularization (to prevent
       overfitting) can be specified with the -l option, and the  optimizer  used  to  train  the
       model  can  be  specified  with  the  --optimizer  option.   Available  options  are 'sgd'
       (stochastic  gradient  descent),  'lbfgs'  (the  L-BFGS  optimizer),  and  'minibatch-sgd'
       (minibatch  stochastic  gradient  descent).   There  are  also  various parameters for the
       optimizer;  the  --max_iterations  parameter  specifies  the  maximum  number  of  allowed
       iterations,  and  the  --tolerance (-e) parameter specifies the tolerance for convergence.
       For the SGD and mini-batch SGD optimizers, the --step_size  parameter  controls  the  step
       size  taken  at  each  iteration  by  the  optimizer. The batch size for mini-batch SGD is
       controlled with the --batch_size (-b) parameter. If the objective function for  your  data
       is  oscillating  between  Inf  and  0, the step size is probably too large. There are more
       parameters for the optimizers, but the C++ interface must be used to access these.

       For SGD, an iteration refers to a single point,  and  for  mini-batch  SGD,  an  iteration
       refers  to  a  single  batch.  So  to  take  a  single  pass  over  the  dataset with SGD,
       --max_iterations should be set to the number of points in the dataset.

       Optionally, the model can be used to predict the responses  for  another  matrix  of  data
       points,  if  --test_file  is  specified.  The  --test_file option can be specified without
       --input_file, so long as an existing logistic regression model is given with --model_file.
       The  output  predictions  from  the logistic regression model are stored in the file given
       with --output_predictions.

       This implementation of logistic regression does not support the general  multi-class  case
       but instead only the two-class case. Any responses must be either 0 or 1.

OPTIONAL INPUT OPTIONS

       --batch_size (-b) [int]
              Batch size for mini-batch SGD. Default value

              50.

                  --decision_boundary  (-d)  [double]  Decision  boundary  for prediction; if the
                  logistic function for a point is less than the boundary, the class is taken  to
                  be 0; otherwise, the class is 1. Default value 0.5.

       --help (-h)
              Default help info.

       --info [string]
              Get  help  on  a  specific module or option.  Default value ''.  --input_model_file
              (-m) [string] File containing existing model (parameters).  Default value ''.

       --labels_file (-l) [string]
              A file containing labels (0 or 1) for the points in the training set  (y).  Default
              value ''.

       --lambda (-L) [double]
              L2-regularization parameter for training.  Default value 0.

       --max_iterations (-n) [int]
              Maximum iterations for optimizer (0 indicates no limit). Default value 10000.

       --optimizer (-O) [string]
              Optimizer to use for training ('lbfgs' or ’sgd'). Default value 'lbfgs'.

       --step_size (-s) [double]
              Step size for SGD and mini-batch SGD optimizers.  Default value 0.01.

       --test_file (-T) [string]
              File containing test dataset. Default value ’'.

       --tolerance (-e) [double]
              Convergence  tolerance  for  optimizer.  Default value 1e-10.  --training_file (-t)
              [string] A file containing the training set (the matrix of predictors, X).  Default
              value ''.

       --verbose (-v)
              Display  informational  messages  and the full list of parameters and timers at the
              end of execution.

       --version (-V)
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_file (-o) [string]
              If --test_file is specified, this file is where the predictions for  the  test  set
              will  be  saved.  Default value ''.  --output_model_file (-M) [string] File to save
              trained     logistic     regression     model     to.     Default     value     ''.
              --output_probabilities_file (-p) [string] If --test_file is specified, this file is
              where the class probabilities for the test set will be saved. Default value ''.

ADDITIONAL INFORMATION

       For further information, including relevant papers, citations,  and  theory,  For  further
       information,  including  relevant papers, citations, and theory, consult the documentation
       found at http://www.mlpack.org or included with your consult the  documentation  found  at
       http://www.mlpack.org  or  included  with  your  DISTRIBUTION  OF MLPACK.  DISTRIBUTION OF
       MLPACK.

                                                     mlpack_logistic_regression(16 November 2017)