Ubuntu Manpage: mlpack_linear_svm - linear svm is an l2-regularized support vector machine.

NAME

       mlpack_linear_svm - linear svm is an l2-regularized support vector machine.

SYNOPSIS

        mlpack_linear_svm [-d double] [-E int] [-m unknown] [-l string] [-r double] [-n int] [-N bool] [-c int] [-O string] [-s int] [-S bool] [-a double] [-T string] [-L string] [-e double] [-t string] [-V bool] [-M unknown] [-P string] [-p string] [-h -v]

DESCRIPTION

       An  implementation  of  linear  SVMs  that  uses either L-BFGS or parallel SGD (stochastic
       gradient descent) to train the model.

       This program allows  loading  a  linear  SVM  model  (via  the  '--input_model_file  (-m)'
       parameter)  or  training  a  linear  SVM  model  given  training  data (specified with the
       '--training_file (-t)' parameter), or both those things at once. In addition, this program
       allows  classification on a test dataset (specified with the '--test_file (-T)' parameter)
       and the classification results may be saved  with  the  '--predictions_file  (-P)'  output
       parameter.  The trained linear SVM model may be saved using the '--output_model_file (-M)'
       output parameter.

       The  training  data,  if  specified,  may  have  class  labels  as  its  last   dimension.
       Alternately,  the  '--labels_file (-l)' parameter may be used to specify a separate vector
       of labels.

       When a model is being trained, there are  many  options.  L2  regularization  (to  prevent
       overfitting)  can  be specified with the '--lambda (-r)' option, and the number of classes
       can be manually specified with the '--num_classes (-c)'and if an  intercept  term  is  not
       desired  in  the  model,  the  '--no_intercept  (-N)' parameter can be specified.Margin of
       difference between correct class and other classes can  be  specified  with  the  '--delta
       (-d)'  option.The optimizer used to train the model can be specified with the '--optimizer
       (-O)' parameter. Available options are 'psgd' (parallel stochastic gradient  descent)  and
       'lbfgs'  (the  L-BFGS optimizer). There are also various parameters for the optimizer; the
       '--max_iterations (-n)' parameter specifies the maximum number of allowed iterations,  and
       the '--tolerance (-e)' parameter specifies the tolerance for convergence. For the parallel
       SGD optimizer, the ’--step_size (-a)' parameter controls  the  step  size  taken  at  each
       iteration  by  the  optimizer  and  the maximum number of epochs (specified with '--epochs
       (-E)'). If the objective function for your data is oscillating between Inf and 0, the step
       size  is  probably  too  large.  There are more parameters for the optimizers, but the C++
       interface must be used to access these.

       Optionally, the model can be used to predict the labels for another matrix of data points,
       if  '--test_file  (-T)'  is  specified.  The '--test_file (-T)' parameter can be specified
       without the '--training_file (-t)' parameter, so long as an existing linear SVM  model  is
       given with the '--input_model_file (-m)' parameter. The output predictions from the linear
       SVM model may be saved with the '--predictions_file (-P)' parameter.

       As an example, to train a LinaerSVM on the data ''data.csv''  with  labels  ’'labels.csv''
       with  L2  regularization  of  0.1,  saving  the model to ’'lsvm_model.bin'', the following
       command may be used:

       $ mlpack_linear_svm --training_file data.csv --labels_file labels.csv --lambda 0.1 --delta
       1 --num_classes 0 --output_model_file lsvm_model.bin

       Then,  to  use  that  model  to  predict classes for the dataset ''test.csv'', storing the
       output predictions in ''predictions.csv'', the following command may be used:

       $    mlpack_linear_svm    --input_model_file    lsvm_model.bin    --test_file     test.csv
       --predictions_file predictions.csv

OPTIONAL INPUT OPTIONS

       --delta (-d) [double]
              Margin of difference between correct class and other classes. Default value 1.

       --epochs (-E) [int]
              Maximum number of full epochs over dataset for psgd Default value 50.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Existing model (parameters).

       --labels_file (-l) [string]
              A matrix containing labels (0 or 1) for the points in the training set (y).

       --lambda (-r) [double]
              L2-regularization parameter for training.  Default value 0.0001.

       --max_iterations (-n) [int]
              Maximum iterations for optimizer (0 indicates no limit). Default value 10000.

       --no_intercept (-N) [bool]
              Do not add the intercept term to the model.

       --num_classes (-c) [int]
              Number  of classes for classification; if unspecified (or 0), the number of classes
              found in the labels will be used. Default value 0.

       --optimizer (-O) [string]
              Optimizer to use for training ('lbfgs' or 'psgd'). Default value 'lbfgs'.

       --seed (-s) [int]
              Random seed. If 0, 'std::time(NULL)' is used.  Default value 0.

       --shuffle (-S) [bool]
              Don't shuffle the order in which data points are visited for parallel SGD.

       --step_size (-a) [double]
              Step size for parallel SGD optimizer. Default value 0.01.

       --test_file (-T) [string]
              Matrix containing test dataset.

       --test_labels_file (-L) [string]
              Matrix containing test labels.

       --tolerance (-e) [double]
              Convergence tolerance for optimizer. Default value 1e-10.

       --training_file (-t) [string]
              A matrix containing the training set (the matrix of predictors, X).

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and  timers  at  the
              end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_model_file (-M) [unknown]
              Output for trained linear svm model.

       --predictions_file (-P) [string]
              If  test  data  is specified, this matrix is where the predictions for the test set
              will be saved.

       --probabilities_file (-p) [string]
              If test data is specified, this matrix is where the  class  probabilities  for  the
              test set will be saved.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant papers, citations, and theory, consult the
       documentation found at http://www.mlpack.org or included with your distribution of mlpack.