Ubuntu Manpage: mlpack_logistic_regression - l2-regularized logistic regression and prediction

Provided by: mlpack-bin_3.4.2-5ubuntu1_amd64

NAME

       mlpack_logistic_regression - l2-regularized logistic regression and prediction

SYNOPSIS

        mlpack_logistic_regression [-b int] [-d double] [-m unknown] [-l string] [-L double] [-n int] [-O string] [-s double] [-T string] [-e double] [-t string] [-V bool] [-o string] [-M unknown] [-x string] [-P string] [-p string] [-h -v]

DESCRIPTION

       An  implementation  of  L2-regularized  logistic  regression  using  either  the  L-BFGS optimizer or SGD
       (stochastic gradient descent). This solves the regression problem

         y = (1 / 1 + e^-(X * b))

       where y takes values 0 or 1.

       This program allows loading a logistic regression model (via the ’--input_model_file (-m)' parameter)  or
       training  a  logistic  regression  model  given  training data (specified with the '--training_file (-t)'
       parameter), or both those things at once. In addition, this  program  allows  classification  on  a  test
       dataset  (specified  with  the  '--test_file (-T)' parameter) and the classification results may be saved
       with the '--predictions_file (-P)' output parameter. The trained logistic regression model may  be  saved
       using the ’--output_model_file (-M)' output parameter.

       The  training  data,  if  specified,  may  have  class  labels  as  its last dimension.  Alternately, the
       '--labels_file (-l)' parameter may be used to specify a separate matrix of labels.

       When a model is being trained, there are many options. L2 regularization (to prevent overfitting) can  be
       specified  with  the  '--lambda  (-L)' option, and the optimizer used to train the model can be specified
       with the '--optimizer (-O)' parameter. Available options are  'sgd'  (stochastic  gradient  descent)  and
       ’lbfgs'   (the   L-BFGS   optimizer).   There   are  also  various  parameters  for  the  optimizer;  the
       '--max_iterations  (-n)'  parameter  specifies  the  maximum  number  of  allowed  iterations,  and   the
       '--tolerance  (-e)'  parameter  specifies  the  tolerance  for  convergence.  For  the SGD optimizer, the
       '--step_size (-s)' parameter controls the step size taken at each iteration by the optimizer.  The  batch
       size  for  SGD  is controlled with the '--batch_size (-b)' parameter.  If the objective function for your
       data is oscillating between Inf and 0, the step size is probably too large. There are more parameters for
       the optimizers, but the C++ interface must be used to access these.

       For SGD, an iteration refers to a single point. So to take a single  pass  over  the  dataset  with  SGD,
       '--max_iterations (-n)' should be set to the number of points in the dataset.

       Optionally,  the  model  can  be  used  to  predict  the  responses for another matrix of data points, if
       '--test_file (-T)'  is  specified.  The  '--test_file  (-T)'  parameter  can  be  specified  without  the
       '--training_file  (-t)'  parameter,  so  long  as an existing logistic regression model is given with the
       ’--input_model_file (-m)' parameter. The output predictions from the logistic  regression  model  may  be
       saved with the '--predictions_file (-P)' parameter.

       Note  :  The  following  parameters are deprecated and will be removed in mlpack 4: '--output_file (-o)',
       '--output_probabilities_file (-x)' Use '--predictions_file (-P)'  instead  of  '--output_file  (-o)'  Use
       '--probabilities_file (-p)' instead of '--output_probabilities_file (-x)'

       This implementation of logistic regression does not support the general multi-class case but instead only
       the  two-class  case.  Any  labels  must  be  either 0 or 1. For more classes, see the softmax_regression
       program.

       As an example, to train a logistic regression model on the data ''data.csv'' with  labels  ''labels.csv''
       with L2 regularization of 0.1, saving the model to ’'lr_model.bin'', the following command may be used:

       $   mlpack_logistic_regression   --training_file   data.csv   --labels_file   labels.csv   --lambda   0.1
       --output_model_file lr_model.bin

       Then, to use that model to predict classes for the dataset ''test.csv'', storing the  output  predictions
       in ''predictions.csv'', the following command may be used:

       $   mlpack_logistic_regression   --input_model_file   lr_model.bin   --test_file  test.csv  --output_file
       predictions.csv

OPTIONAL INPUT OPTIONS

       --batch_size (-b) [int]
              Batch size for SGD. Default value 64.

       --decision_boundary (-d) [double]
              Decision boundary for prediction; if the logistic function for a point is less than the  boundary,
              the class is taken to be 0; otherwise, the class is 1. Default value 0.5.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Existing model (parameters).

       --labels_file (-l) [string]
              A matrix containing labels (0 or 1) for the points in the training set (y).

       --lambda (-L) [double]
              L2-regularization parameter for training.  Default value 0.

       --max_iterations (-n) [int]
              Maximum iterations for optimizer (0 indicates no limit). Default value 10000.

       --optimizer (-O) [string]
              Optimizer to use for training ('lbfgs' or 'sgd'). Default value 'lbfgs'.

       --step_size (-s) [double]
              Step size for SGD optimizer. Default value 0.01.

       --test_file (-T) [string]
              Matrix containing test dataset.

       --tolerance (-e) [double]
              Convergence tolerance for optimizer. Default value 1e-10.

       --training_file (-t) [string]
              A matrix containing the training set (the matrix of predictors, X).

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_file (-o) [string]
              If test data is specified, this matrix is where the predictions for the test set will be saved.

       --output_model_file (-M) [unknown]
              Output for trained logistic regression model.

       --output_probabilities_file (-x) [string]
              If  test  data is specified, this matrix is where the class probabilities for the test set will be
              saved.

       --predictions_file (-P) [string]
              If test data is specified, this matrix is where the predictions for the test set will be saved.

       --probabilities_file (-p) [string]
              If test data is specified, this matrix is where the class probabilities for the test set  will  be
              saved.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant  papers, citations, and theory, consult the documentation
       found at http://www.mlpack.org or included with your distribution of mlpack.

mlpack-3.4.2                                      11 April 2022                    mlpack_logistic_regression(1)