Ubuntu Manpage: mlpack_logistic_regression - l2-regularized logistic regression and prediction

Provided by: mlpack-bin_2.2.5-1build1_amd64

NAME

       mlpack_logistic_regression - l2-regularized logistic regression and prediction

SYNOPSIS

        mlpack_logistic_regression [-h] [-v]

DESCRIPTION

       An  implementation  of  L2-regularized  logistic  regression  using  either  the  L-BFGS optimizer or SGD
       (stochastic gradient descent). This solves the regression problem

         y = (1 / 1 + e^-(X * b))

       where y takes values 0 or 1.

       This program allows loading a logistic  regression  model  from  a  file  (-i)  or  training  a  logistic
       regression model given training data (-t), or both those things at once. In addition, this program allows
       classification  on  a test dataset (-T) and will save the classification results to the given output file
       (-o). The logistic regression model itself may be saved with a file specified using the -m option.

       The training data given with the -t option should have class labels as its last  dimension  (so,  if  the
       training  data  is  in CSV format, labels should be the last column). Alternately, the -l (--labels_file)
       option may be used to specify a separate file of labels.

       When a model is being trained, there are many options. L2 regularization (to prevent overfitting) can  be
       specified  with  the  -l  option,  and  the  optimizer  used to train the model can be specified with the
       --optimizer option.  Available options are 'sgd'  (stochastic  gradient  descent),  'lbfgs'  (the  L-BFGS
       optimizer),  and  'minibatch-sgd'  (minibatch  stochastic  gradient  descent).   There  are  also various
       parameters for the optimizer; the --max_iterations parameter specifies  the  maximum  number  of  allowed
       iterations,  and  the --tolerance (-e) parameter specifies the tolerance for convergence. For the SGD and
       mini-batch SGD optimizers, the --step_size parameter controls the step size taken at  each  iteration  by
       the  optimizer.  The batch size for mini-batch SGD is controlled with the --batch_size (-b) parameter. If
       the objective function for your data is oscillating between Inf and 0, the  step  size  is  probably  too
       large. There are more parameters for the optimizers, but the C++ interface must be used to access these.

       For  SGD,  an iteration refers to a single point, and for mini-batch SGD, an iteration refers to a single
       batch. So to take a single pass over the dataset with SGD, --max_iterations should be set to  the  number
       of points in the dataset.

       Optionally,  the  model  can  be  used  to  predict  the  responses for another matrix of data points, if
       --test_file is specified. The --test_file option can be specified without --input_file,  so  long  as  an
       existing  logistic  regression model is given with --model_file. The output predictions from the logistic
       regression model are stored in the file given with --output_predictions.

       This implementation of logistic regression does not support the general multi-class case but instead only
       the two-class case. Any responses must be either 0 or 1.

OPTIONAL INPUT OPTIONS

       --batch_size (-b) [int]
              Batch size for mini-batch SGD. Default value

              50.

                  --decision_boundary (-d) [double] Decision boundary for prediction; if the  logistic  function
                  for  a point is less than the boundary, the class is taken to be 0; otherwise, the class is 1.
                  Default value 0.5.

       --help (-h)
              Default help info.

       --info [string]
              Get help on a specific module or option.  Default value ''.  --input_model_file (-m) [string] File
              containing existing model (parameters).  Default value ''.

       --labels_file (-l) [string]
              A file containing labels (0 or 1) for the points in the training set (y). Default value ''.

       --lambda (-L) [double]
              L2-regularization parameter for training.  Default value 0.

       --max_iterations (-n) [int]
              Maximum iterations for optimizer (0 indicates no limit). Default value 10000.

       --optimizer (-O) [string]
              Optimizer to use for training ('lbfgs' or ’sgd'). Default value 'lbfgs'.

       --step_size (-s) [double]
              Step size for SGD and mini-batch SGD optimizers.  Default value 0.01.

       --test_file (-T) [string]
              File containing test dataset. Default value ’'.

       --tolerance (-e) [double]
              Convergence tolerance for optimizer. Default value 1e-10.  --training_file (-t)  [string]  A  file
              containing the training set (the matrix of predictors, X). Default value ''.

       --verbose (-v)
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V)
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_file (-o) [string]
              If  --test_file  is  specified, this file is where the predictions for the test set will be saved.
              Default value ''.  --output_model_file (-M) [string] File  to  save  trained  logistic  regression
              model  to.  Default  value  ''.   --output_probabilities_file  (-p)  [string]  If  --test_file  is
              specified, this file is where the class probabilities for the test  set  will  be  saved.  Default
              value ''.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant  papers,  citations, and theory, For further information,
       including   relevant   papers,   citations,   and   theory,   consult   the   documentation   found    at
       http://www.mlpack.org  or  included with your consult the documentation found at http://www.mlpack.org or
       included with your DISTRIBUTION OF MLPACK.  DISTRIBUTION OF MLPACK.

                                                                    mlpack_logistic_regression(16 November 2017)