Provided by: mlpack-bin_2.2.5-1build1_amd64 

NAME
mlpack_logistic_regression - l2-regularized logistic regression and prediction
SYNOPSIS
mlpack_logistic_regression [-h] [-v]
DESCRIPTION
An implementation of L2-regularized logistic regression using either the L-BFGS optimizer or SGD
(stochastic gradient descent). This solves the regression problem
y = (1 / 1 + e^-(X * b))
where y takes values 0 or 1.
This program allows loading a logistic regression model from a file (-i) or training a logistic
regression model given training data (-t), or both those things at once. In addition, this program allows
classification on a test dataset (-T) and will save the classification results to the given output file
(-o). The logistic regression model itself may be saved with a file specified using the -m option.
The training data given with the -t option should have class labels as its last dimension (so, if the
training data is in CSV format, labels should be the last column). Alternately, the -l (--labels_file)
option may be used to specify a separate file of labels.
When a model is being trained, there are many options. L2 regularization (to prevent overfitting) can be
specified with the -l option, and the optimizer used to train the model can be specified with the
--optimizer option. Available options are 'sgd' (stochastic gradient descent), 'lbfgs' (the L-BFGS
optimizer), and 'minibatch-sgd' (minibatch stochastic gradient descent). There are also various
parameters for the optimizer; the --max_iterations parameter specifies the maximum number of allowed
iterations, and the --tolerance (-e) parameter specifies the tolerance for convergence. For the SGD and
mini-batch SGD optimizers, the --step_size parameter controls the step size taken at each iteration by
the optimizer. The batch size for mini-batch SGD is controlled with the --batch_size (-b) parameter. If
the objective function for your data is oscillating between Inf and 0, the step size is probably too
large. There are more parameters for the optimizers, but the C++ interface must be used to access these.
For SGD, an iteration refers to a single point, and for mini-batch SGD, an iteration refers to a single
batch. So to take a single pass over the dataset with SGD, --max_iterations should be set to the number
of points in the dataset.
Optionally, the model can be used to predict the responses for another matrix of data points, if
--test_file is specified. The --test_file option can be specified without --input_file, so long as an
existing logistic regression model is given with --model_file. The output predictions from the logistic
regression model are stored in the file given with --output_predictions.
This implementation of logistic regression does not support the general multi-class case but instead only
the two-class case. Any responses must be either 0 or 1.
OPTIONAL INPUT OPTIONS
--batch_size (-b) [int]
Batch size for mini-batch SGD. Default value
50.
--decision_boundary (-d) [double] Decision boundary for prediction; if the logistic function
for a point is less than the boundary, the class is taken to be 0; otherwise, the class is 1.
Default value 0.5.
--help (-h)
Default help info.
--info [string]
Get help on a specific module or option. Default value ''. --input_model_file (-m) [string] File
containing existing model (parameters). Default value ''.
--labels_file (-l) [string]
A file containing labels (0 or 1) for the points in the training set (y). Default value ''.
--lambda (-L) [double]
L2-regularization parameter for training. Default value 0.
--max_iterations (-n) [int]
Maximum iterations for optimizer (0 indicates no limit). Default value 10000.
--optimizer (-O) [string]
Optimizer to use for training ('lbfgs' or ’sgd'). Default value 'lbfgs'.
--step_size (-s) [double]
Step size for SGD and mini-batch SGD optimizers. Default value 0.01.
--test_file (-T) [string]
File containing test dataset. Default value ’'.
--tolerance (-e) [double]
Convergence tolerance for optimizer. Default value 1e-10. --training_file (-t) [string] A file
containing the training set (the matrix of predictors, X). Default value ''.
--verbose (-v)
Display informational messages and the full list of parameters and timers at the end of execution.
--version (-V)
Display the version of mlpack.
OPTIONAL OUTPUT OPTIONS
--output_file (-o) [string]
If --test_file is specified, this file is where the predictions for the test set will be saved.
Default value ''. --output_model_file (-M) [string] File to save trained logistic regression
model to. Default value ''. --output_probabilities_file (-p) [string] If --test_file is
specified, this file is where the class probabilities for the test set will be saved. Default
value ''.
ADDITIONAL INFORMATION
ADDITIONAL INFORMATION
For further information, including relevant papers, citations, and theory, For further information,
including relevant papers, citations, and theory, consult the documentation found at
http://www.mlpack.org or included with your consult the documentation found at http://www.mlpack.org or
included with your DISTRIBUTION OF MLPACK. DISTRIBUTION OF MLPACK.
mlpack_logistic_regression(16 November 2017)