Ubuntu Manpage: mlpack_logistic_regression - l2-regularized logistic regression and prediction

NAME

       mlpack_logistic_regression - l2-regularized logistic regression and prediction

SYNOPSIS

        mlpack_logistic_regression [-b int] [-d double] [-m unknown] [-l string] [-L double] [-n int] [-O string] [-s double] [-T string] [-e double] [-t string] [-V bool] [-o string] [-M unknown] [-x string] [-P string] [-p string] [-h -v]

DESCRIPTION

       An  implementation of L2-regularized logistic regression using either the L-BFGS optimizer
       or SGD (stochastic gradient descent). This solves the regression problem

         y = (1 / 1 + e^-(X * b))

       where y takes values 0 or 1.

       This program allows loading a logistic regression model (via the ’--input_model_file (-m)'
       parameter) or training a logistic regression model given training data (specified with the
       '--training_file (-t)' parameter), or both those things at once. In addition, this program
       allows  classification on a test dataset (specified with the '--test_file (-T)' parameter)
       and the classification results may be saved  with  the  '--predictions_file  (-P)'  output
       parameter.   The   trained   logistic   regression   model   may   be   saved   using  the
       ’--output_model_file (-M)' output parameter.

       The  training  data,  if  specified,  may  have  class  labels  as  its  last   dimension.
       Alternately,  the  '--labels_file (-l)' parameter may be used to specify a separate matrix
       of labels.

       When a model is being trained, there are  many  options.  L2  regularization  (to  prevent
       overfitting)  can  be specified with the '--lambda (-L)' option, and the optimizer used to
       train the model can be specified with the '--optimizer (-O)' parameter. Available  options
       are 'sgd' (stochastic gradient descent) and ’lbfgs' (the L-BFGS optimizer). There are also
       various parameters for the optimizer; the '--max_iterations (-n)' parameter specifies  the
       maximum  number  of allowed iterations, and the '--tolerance (-e)' parameter specifies the
       tolerance for convergence.  For  the  SGD  optimizer,  the  '--step_size  (-s)'  parameter
       controls  the  step size taken at each iteration by the optimizer.  The batch size for SGD
       is controlled with the '--batch_size (-b)' parameter.  If the objective function for  your
       data is oscillating between Inf and 0, the step size is probably too large. There are more
       parameters for the optimizers, but the C++ interface must be used to access these.

       For SGD, an iteration refers to a single point. So to take a single pass over the  dataset
       with SGD, '--max_iterations (-n)' should be set to the number of points in the dataset.

       Optionally,  the  model  can  be  used to predict the responses for another matrix of data
       points, if '--test_file (-T)' is  specified.  The  '--test_file  (-T)'  parameter  can  be
       specified  without  the  '--training_file (-t)' parameter, so long as an existing logistic
       regression model is  given  with  the  ’--input_model_file  (-m)'  parameter.  The  output
       predictions  from  the logistic regression model may be saved with the '--predictions_file
       (-P)' parameter.

       Note : The  following  parameters  are  deprecated  and  will  be  removed  in  mlpack  4:
       '--output_file  (-o)',  '--output_probabilities_file  (-x)'  Use '--predictions_file (-P)'
       instead   of   '--output_file   (-o)'   Use   '--probabilities_file   (-p)'   instead   of
       '--output_probabilities_file (-x)'

       This  implementation  of logistic regression does not support the general multi-class case
       but instead only the two-class case. Any labels must be either 0 or 1. For  more  classes,
       see the softmax_regression program.

       As  an  example, to train a logistic regression model on the data ''data.csv'' with labels
       ''labels.csv'' with L2 regularization of 0.1, saving the model  to  ’'lr_model.bin'',  the
       following command may be used:

       $  mlpack_logistic_regression  --training_file  data.csv --labels_file labels.csv --lambda
       0.1 --output_model_file lr_model.bin

       Then, to use that model to predict classes  for  the  dataset  ''test.csv'',  storing  the
       output predictions in ''predictions.csv'', the following command may be used:

       $   mlpack_logistic_regression   --input_model_file   lr_model.bin   --test_file  test.csv
       --output_file predictions.csv

OPTIONAL INPUT OPTIONS

       --batch_size (-b) [int]
              Batch size for SGD. Default value 64.

       --decision_boundary (-d) [double]
              Decision boundary for prediction; if the logistic function for a point is less than
              the  boundary, the class is taken to be 0; otherwise, the class is 1. Default value
              0.5.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Existing model (parameters).

       --labels_file (-l) [string]
              A matrix containing labels (0 or 1) for the points in the training set (y).

       --lambda (-L) [double]
              L2-regularization parameter for training.  Default value 0.

       --max_iterations (-n) [int]
              Maximum iterations for optimizer (0 indicates no limit). Default value 10000.

       --optimizer (-O) [string]
              Optimizer to use for training ('lbfgs' or 'sgd'). Default value 'lbfgs'.

       --step_size (-s) [double]
              Step size for SGD optimizer. Default value 0.01.

       --test_file (-T) [string]
              Matrix containing test dataset.

       --tolerance (-e) [double]
              Convergence tolerance for optimizer. Default value 1e-10.

       --training_file (-t) [string]
              A matrix containing the training set (the matrix of predictors, X).

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and  timers  at  the
              end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_file (-o) [string]
              If  test  data  is specified, this matrix is where the predictions for the test set
              will be saved.

       --output_model_file (-M) [unknown]
              Output for trained logistic regression model.

       --output_probabilities_file (-x) [string]
              If test data is specified, this matrix is where the  class  probabilities  for  the
              test set will be saved.

       --predictions_file (-P) [string]
              If  test  data  is specified, this matrix is where the predictions for the test set
              will be saved.

       --probabilities_file (-p) [string]
              If test data is specified, this matrix is where the  class  probabilities  for  the
              test set will be saved.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant papers, citations, and theory, consult the
       documentation found at http://www.mlpack.org or included with your distribution of mlpack.