Ubuntu Manpage: mlpack_hoeffding_tree

Provided by: mlpack-bin_3.4.2-5ubuntu1_amd64

NAME

       mlpack_hoeffding_tree - hoeffding trees

SYNOPSIS

        mlpack_hoeffding_tree [-b bool] [-B int] [-c double] [-m unknown] [-l string] [-n int] [-I int] [-N string] [-o int] [-s int] [-T string] [-L string] [-t string] [-V bool] [-M unknown] [-p string] [-P string] [-h -v]

DESCRIPTION

       This program implements Hoeffding trees, a form of streaming decision tree suited best for
       large (or streaming) datasets. This program supports both categorical  and  numeric  data.
       Given  an  input  dataset,  this  program is able to train the tree with numerous training
       options, and save the model to a file.  The program is also able to use a trained model or
       a model from file in order to predict classes for a given test set.

       The  training file and associated labels are specified with the ’--training_file (-t)' and
       '--labels_file (-l)' parameters, respectively.  Optionally, if '--labels_file (-l)' is not
       specified, the labels are assumed to be the last dimension of the training dataset.

       The  training  may  be performed in batch mode (like a typical decision tree algorithm) by
       specifying the '--batch_mode (-b)' option, but this may not be the best option  for  large
       datasets.

       When  a  model  is  trained,  it  may  be  saved via the '--output_model_file (-M)' output
       parameter. A model may be loaded from file  for  further  training  or  testing  with  the
       '--input_model_file (-m)' parameter.

       Test  data  may  be  specified  with  the '--test_file (-T)' parameter, and if performance
       statistics  are  desired  for  that  test  set,  labels  may   be   specified   with   the
       '--test_labels_file (-L)' parameter. Predictions for each test point may be saved with the
       '--predictions_file (-p)' output parameter, and class probabilities  for  each  prediction
       may be saved with the '--probabilities_file (-P)' output parameter.

       For  example,  to  train  a  Hoeffding  tree with confidence 0.99 with data ’dataset.csv',
       saving the trained tree to 'tree.bin', the following command may be used:

       $ mlpack_hoeffding_tree --training_file dataset.arff --confidence 0.99 --output_model_file
       tree.bin

       Then, this tree may be used to make predictions on the test set ’test_set.csv', saving the
       predictions into 'predictions.csv' and the class probabilities into 'class_probs.csv' with
       the following command:

       $    mlpack_hoeffding_tree    --input_model_file    tree.bin   --test_file   test_set.arff
       --predictions_file predictions.csv --probabilities_file class_probs.csv

OPTIONAL INPUT OPTIONS

       --batch_mode (-b) [bool]
              If true, samples will be considered in batch instead of as a stream. This generally
              results in better trees but at the cost of memory usage and runtime.

       --bins (-B) [int]
              If  the  'domingos'  split  strategy is used, this specifies the number of bins for
              each numeric split. Default value 10.

       --confidence (-c) [double]
              Confidence before splitting (between 0 and 1).  Default value 0.95.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --info_gain (-i) [bool]
              If set, information gain is used instead of Gini impurity for calculating Hoeffding
              bounds.

       --input_model_file (-m) [unknown]
              Input trained Hoeffding tree model.

       --labels_file (-l) [string]
              Labels for training dataset.

       --max_samples (-n) [int]
              Maximum number of samples before splitting.  Default value 5000.

       --min_samples (-I) [int]
              Minimum number of samples before splitting.  Default value 100.

       --numeric_split_strategy (-N) [string]
              The splitting strategy to use for numeric features: 'domingos' or 'binary'. Default
              value 'binary'.

       --observations_before_binning (-o) [int]
              If the 'domingos' split strategy is used, this  specifies  the  number  of  samples
              observed before binning is performed. Default value 100.

       --passes (-s) [int]
              Number of passes to take over the dataset.  Default value 1.

       --test_file (-T) [string]
              Testing dataset (may be categorical).

       --test_labels_file (-L) [string]
              Labels of test data.

       --training_file (-t) [string]
              Training dataset (may be categorical).

       --verbose (-v) [bool]
              Display  informational  messages  and the full list of parameters and timers at the
              end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_model_file (-M) [unknown]
              Output for trained Hoeffding tree model.

       --predictions_file (-p) [string]
              Matrix to output label predictions for test data into.

       --probabilities_file (-P) [string]
              In addition to predicting labels, provide rediction probabilities in this matrix.

ADDITIONAL INFORMATION

       For further information, including relevant papers, citations,  and  theory,  consult  the
       documentation found at http://www.mlpack.org or included with your distribution of mlpack.