Ubuntu Manpage: mlpack_hoeffding_tree

name
synopsis
description
options
additional information
additional information

NAME

       mlpack_hoeffding_tree - hoeffding trees

SYNOPSIS

        mlpack_hoeffding_tree [-h] [-v] [-b] [-B int] [-c double] [-m string] [-l string] [-n int] [-I int] [-N string] [-o int] [-M string] [-s int] [-p string] [-P string] [-T string] [-L string] [-t string] -V

DESCRIPTION

This program implements Hoeffding trees, a form of streaming decision tree suited best for large (or
streaming) datasets. This program supports both categorical and numeric data stored in the ARFF format.
Given an input dataset, this program is able to train the tree with numerous training options, and save
the model to a file. The program is also able to use a trained model or a model from file in order to
predict classes for a given test set.

The training file and associated labels are specified with the --training_file and --labels_file options,
respectively. The training file must be in ARFF format. The training may be performed in batch mode (like
a typical decision tree algorithm) by specifying the --batch_mode option, but this may not be the best
option for large datasets.

When a model is trained, it may be saved to a file with the --output_model_file (-M) option. A model may
be loaded from file for further training or testing with the --input_model_file (-m) option.

A test file may be specified with the --test_file (-T) option, and if performance numbers are desired for
that test set, labels may be specified with the --test_labels_file (-L) option. Predictions for each test
point will be stored in the file specified by --predictions_file (-p) and probabilities for each
predictions will be stored in the file specified by the --probabilities_file (-P) option.

OPTIONS

       --batch_mode (-b)
              If true, samples will be considered in batch instead of as a stream.  This  generally  results  in
              better trees but at the cost of memory usage and runtime.

       --bins (-B) [int]
              If  the  'domingos'  split  strategy  is  used, this specifies the number of bins for each numeric
              split. Default value 10.

       --confidence (-c) [double]
              Confidence before splitting (between 0 and 1).  Default value 0.95.

       --help (-h)
              Default help info.

       --info [string]
              Get help on a specific module or option.  Default value ''.

       --info_gain (-i)
              If set, information gain is used instead  of  Gini  impurity  for  calculating  Hoeffding  bounds.
              --input_model_file (-m) [string] File to load trained tree from. Default value ’'.

       --labels_file (-l) [string]
              Labels for training dataset. Default value ''.

       --max_samples (-n) [int]
              Maximum number of samples before splitting.  Default value 5000.

       --min_samples (-I) [int]
              Minimum  number  of  samples  before splitting.  Default value 100.  --numeric_split_strategy (-N)
              [string] The splitting strategy to use for numeric features: 'domingos' or 'binary'. Default value
              ’binary'.  --observations_before_binning (-o) [int] If the 'domingos' split strategy is used, this
              specifies the number  of  samples  observed  before  binning  is  performed.  Default  value  100.
              --output_model_file (-M) [string] File to save trained tree to. Default value ’'.

       --passes (-s) [int]
              Number  of  passes  to  take over the dataset.  Default value 1.  --predictions_file (-p) [string]
              File to output label predictions for test data into. Default value ''.  --probabilities_file  (-P)
              [string]  In addition to predicting labels, provide prediction probabilities in this file. Default
              value ''.

       --test_file (-T) [string]
              File of testing data. Default value ''.  --test_labels_file (-L) [string]  Labels  of  test  data.
              Default value ''.  --training_file (-t) [string] Training dataset file. Default value ''.

       --verbose (-v)
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V)
              Display the version of mlpack.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant  papers,  citations, and theory, For further information,
       including   relevant   papers,   citations,   and   theory,   consult   the   documentation   found    at
       http://www.mlpack.org  or  included with your consult the documentation found at http://www.mlpack.org or
       included with your DISTRIBUTION OF MLPACK.  DISTRIBUTION OF MLPACK.

                                                                                        mlpack_hoeffding_tree(1)