Ubuntu Manpage: mlpack_hoeffding_tree

name
synopsis
description
optional input options
optional output options
additional information

NAME

       mlpack_hoeffding_tree - hoeffding trees

SYNOPSIS

        mlpack_hoeffding_tree [-b bool] [-B int] [-c double] [-m unknown] [-l string] [-n int] [-I int] [-N string] [-o int] [-s int] [-T string] [-L string] [-t string] [-V bool] [-M unknown] [-p string] [-P string] [-h -v]

DESCRIPTION

       This  program  implements  Hoeffding  trees,  a form of streaming decision tree suited best for large (or
       streaming) datasets. This program supports both categorical and numeric data.  Given  an  input  dataset,
       this program is able to train the tree with numerous training options, and save the model to a file.  The
       program is also able to use a trained model or a model from file in order to predict classes for a  given
       test set.

       The  training file and associated labels are specified with the ’--training_file (-t)' and '--labels_file
       (-l)' parameters, respectively.  Optionally, if '--labels_file (-l)' is not  specified,  the  labels  are
       assumed to be the last dimension of the training dataset.

       The  training  may  be performed in batch mode (like a typical decision tree algorithm) by specifying the
       '--batch_mode (-b)' option, but this may not be the best option for large datasets.

       When a model is trained, it may be saved via the '--output_model_file (-M)' output parameter. A model may
       be loaded from file for further training or testing with the '--input_model_file (-m)' parameter.

       Test  data  may  be  specified  with  the '--test_file (-T)' parameter, and if performance statistics are
       desired for that test set,  labels  may  be  specified  with  the  '--test_labels_file  (-L)'  parameter.
       Predictions  for  each  test  point may be saved with the '--predictions_file (-p)' output parameter, and
       class probabilities for each  prediction  may  be  saved  with  the  '--probabilities_file  (-P)'  output
       parameter.

       For  example,  to train a Hoeffding tree with confidence 0.99 with data ’dataset.csv', saving the trained
       tree to 'tree.bin', the following command may be used:

       $ mlpack_hoeffding_tree --training_file dataset.arff --confidence 0.99 --output_model_file tree.bin

       Then, this tree may be used to make predictions on the test set ’test_set.csv',  saving  the  predictions
       into 'predictions.csv' and the class probabilities into 'class_probs.csv' with the following command:

       $   mlpack_hoeffding_tree   --input_model_file   tree.bin  --test_file  test_set.arff  --predictions_file
       predictions.csv --probabilities_file class_probs.csv

OPTIONAL INPUT OPTIONS

       --batch_mode (-b) [bool]
              If true, samples will be considered in batch instead of as a stream.  This  generally  results  in
              better trees but at the cost of memory usage and runtime.

       --bins (-B) [int]
              If  the  'domingos'  split  strategy  is  used, this specifies the number of bins for each numeric
              split. Default value 10.

       --confidence (-c) [double]
              Confidence before splitting (between 0 and 1).  Default value 0.95.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --info_gain (-i) [bool]
              If set, information gain is used instead of Gini impurity for calculating Hoeffding bounds.

       --input_model_file (-m) [unknown]
              Input trained Hoeffding tree model.

       --labels_file (-l) [string]
              Labels for training dataset.

       --max_samples (-n) [int]
              Maximum number of samples before splitting.  Default value 5000.

       --min_samples (-I) [int]
              Minimum number of samples before splitting.  Default value 100.

       --numeric_split_strategy (-N) [string]
              The splitting strategy to  use  for  numeric  features:  'domingos'  or  'binary'.  Default  value
              'binary'.

       --observations_before_binning (-o) [int]
              If  the  'domingos'  split  strategy is used, this specifies the number of samples observed before
              binning is performed. Default value 100.

       --passes (-s) [int]
              Number of passes to take over the dataset.  Default value 1.

       --test_file (-T) [string]
              Testing dataset (may be categorical).

       --test_labels_file (-L) [string]
              Labels of test data.

       --training_file (-t) [string]
              Training dataset (may be categorical).

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_model_file (-M) [unknown]
              Output for trained Hoeffding tree model.

       --predictions_file (-p) [string]
              Matrix to output label predictions for test data into.

       --probabilities_file (-P) [string]
              In addition to predicting labels, provide rediction probabilities in this matrix.

ADDITIONAL INFORMATION

       For further information, including relevant papers, citations,  and  theory,  consult  the  documentation
       found at http://www.mlpack.org or included with your distribution of mlpack.