Ubuntu Manpage: mlpack_decision_stump

NAME

       mlpack_decision_stump - decision stump

SYNOPSIS

        mlpack_decision_stump [-b int] [-m unknown] [-l string] [-T string] [-t string] [-V bool] [-M unknown] [-p string] [-h -v]

DESCRIPTION

This program implements a decision stump, which is a single-level decision tree. The
decision stump will split on one dimension of the input data, and will split into multiple
buckets. The dimension and bins are selected by maximizing the information gain of the
split. Optionally, the minimum number of training points in each bin can be specified with
the '--bucket_size (-b)' parameter.

The decision stump is parameterized by a splitting dimension and a vector of values that
denote the splitting values of each bin.

This program enables several applications: a decision tree may be trained or loaded, and
then that decision tree may be used to classify a given set of test points. The decision
tree may also be saved to a file for later usage.

To train a decision stump, training data should be passed with the ’--training_file (-t)'
parameter, and their corresponding labels should be passed with the '--labels_file (-l)'
option. Optionally, if '--labels_file (-l)' is not specified, the labels are assumed to be
the last dimension of the training dataset. The '--bucket_size (-b)' parameter controls
the minimum number of training points in each decision stump bucket.

For classifying a test set, a decision stump may be loaded with the ’--input_model_file
(-m)' parameter (useful for the situation where a stump has already been trained), and a
test set may be specified with the ’--test_file (-T)' parameter. The predicted labels can
be saved with the ’--predictions_file (-p)' output parameter.

Because decision stumps are trained in batch, retraining does not make sense and thus it
is not possible to pass both '--training_file (-t)' and ’--input_model_file (-m)';
instead, simply build a new decision stump with the training data.

After training, a decision stump can be saved with the '--output_model_file (-M)' output
parameter. That stump may later be re-used in subsequent calls to this program (or
others).

OPTIONAL INPUT OPTIONS

       --bucket_size (-b) [int]
              The  minimum number of training points in each decision stump bucket. Default value
              6.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Decision stump model to load.

       --labels_file (-l) [string]
              Labels for the training set. If not specified, the labels are  assumed  to  be  the
              last row of the training data.

       --test_file (-T) [string]
              A dataset to calculate predictions for.

       --training_file (-t) [string]
              The dataset to train on.

       --verbose (-v) [bool]
              Display  informational  messages  and the full list of parameters and timers at the
              end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_model_file (-M) [unknown]
              Output decision stump model to save.

       --predictions_file (-p) [string]
              The output matrix that will hold the predicted labels for the test set.

ADDITIONAL INFORMATION

       For further information, including relevant papers, citations,  and  theory,  consult  the
       documentation found at http://www.mlpack.org or included with your distribution of mlpack.