lunar (1) mlpack_decision_tree.1.gz

Provided by: mlpack-bin_3.4.2-7ubuntu1_amd64 bug

NAME

       mlpack_decision_tree - decision tree

SYNOPSIS

        mlpack_decision_tree [-m unknown] [-l string] [-D int] [-g double] [-n int] [-a bool] [-e bool] [-T string] [-L string] [-t string] [-V bool] [-w string] [-M unknown] [-p string] [-P string] [-h -v]

DESCRIPTION

       Train  and  evaluate  using  a  decision  tree.  Given  a  dataset  containing  numeric or
       categorical features, and associated labels for each point in the  dataset,  this  program
       can train a decision tree on that data.

       The  training  set and associated labels are specified with the '--training_file (-t)' and
       '--labels_file (-l)' parameters, respectively. The labels  should  be  in  the  range  [0,
       num_classes  -  1].  Optionally,  if '--labels_file (-l)' is not specified, the labels are
       assumed to be the last dimension of the training dataset.

       When a model is trained, the '--output_model_file (-M)' output parameter may  be  used  to
       save the trained model. A model may be loaded for predictions with the '--input_model_file
       (-m)' parameter. The '--input_model_file (-m)' parameter may not  be  specified  when  the
       '--training_file  (-t)'  parameter  is specified. The '--minimum_leaf_size (-n)' parameter
       specifies the minimum number of training points that must fall into each leaf for it to be
       split.   The  '--minimum_gain_split  (-g)'  parameter  specifies  the minimum gain that is
       needed for the node to split. The '--maximum_depth (-D)' parameter specifies  the  maximum
       depth  of the tree. If '--print_training_error (-e)' is specified, the training error will
       be printed.

       Test data may be specified with the  '--test_file  (-T)'  parameter,  and  if  performance
       numbers   are   desired   for   that   test   set,   labels  may  be  specified  with  the
       '--test_labels_file (-L)' parameter. Predictions for each test point may be saved via  the
       '--predictions_file (-p)' output parameter. Class probabilities for each prediction may be
       saved with the '--probabilities_file (-P)' output parameter.

       For example, to train a decision tree with a minimum  leaf  size  of  20  on  the  dataset
       contained  in  'data.csv'  with labels 'labels.csv', saving the output model to 'tree.bin'
       and printing the training error, one could call

       $    mlpack_decision_tree    --training_file    data.arff     --labels_file     labels.csv
       --output_model_file    tree.bin    --minimum_leaf_size   20   --minimum_gain_split   0.001
       --print_training_accuracy

       Then, to use that model to classify points in 'test_set.csv'  and  print  the  test  error
       given the labels 'test_labels.csv' using that model, while saving the predictions for each
       point to 'predictions.csv', one could call

       $   mlpack_decision_tree    --input_model_file    tree.bin    --test_file    test_set.arff
       --test_labels_file test_labels.csv --predictions_file predictions.csv

OPTIONAL INPUT OPTIONS

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Pre-trained decision tree, to be used with test points.

       --labels_file (-l) [string]
              Training labels.

       --maximum_depth (-D) [int]
              Maximum depth of the tree (0 means no limit).  Default value 0.

       --minimum_gain_split (-g) [double]
              Minimum gain for node splitting. Default value 1e-07.

       --minimum_leaf_size (-n) [int]
              Minimum number of points in a leaf. Default value 20.

       --print_training_accuracy (-a) [bool]
              Print the training accuracy.

       --print_training_error (-e) [bool]
              Print the training error (deprecated; will be removed in mlpack 4.0.0).

       --test_file (-T) [string]
              Testing dataset (may be categorical).

       --test_labels_file (-L) [string]
              Test point labels, if accuracy calculation is desired.

       --training_file (-t) [string]
              Training dataset (may be categorical).

       --verbose (-v) [bool]
              Display  informational  messages  and the full list of parameters and timers at the
              end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

       --weights_file (-w) [string] The weight of labels

OPTIONAL OUTPUT OPTIONS

       --output_model_file (-M) [unknown]
              Output for trained decision tree.

       --predictions_file (-p) [string]
              Class predictions for each test point.

       --probabilities_file (-P) [string]
              Class probabilities for each test point.

ADDITIONAL INFORMATION

       For further information, including relevant papers, citations,  and  theory,  consult  the
       documentation found at http://www.mlpack.org or included with your distribution of mlpack.