Ubuntu Manpage: mlpack_linear_svm - linear svm is an l2-regularized support vector machine.

NAME

       mlpack_linear_svm - linear svm is an l2-regularized support vector machine.

SYNOPSIS

        mlpack_linear_svm [-d double] [-E int] [-m unknown] [-l string] [-r double] [-n int] [-N bool] [-c int] [-O string] [-s int] [-S bool] [-a double] [-T string] [-L string] [-e double] [-t string] [-V bool] [-M unknown] [-P string] [-p string] [-h -v]

DESCRIPTION

       An implementation of linear SVMs that uses either L-BFGS or parallel SGD (stochastic gradient descent) to
       train the model.

       This  program allows loading a linear SVM model (via the '--input_model_file (-m)' parameter) or training
       a linear SVM model given training data (specified with the '--training_file  (-t)'  parameter),  or  both
       those  things  at once. In addition, this program allows classification on a test dataset (specified with
       the  '--test_file  (-T)'  parameter)  and  the   classification   results   may   be   saved   with   the
       '--predictions_file  (-P)'  output  parameter.  The  trained  linear  SVM  model  may  be saved using the
       '--output_model_file (-M)' output parameter.

       The training data, if specified,  may  have  class  labels  as  its  last  dimension.   Alternately,  the
       '--labels_file (-l)' parameter may be used to specify a separate vector of labels.

       When  a model is being trained, there are many options. L2 regularization (to prevent overfitting) can be
       specified with the '--lambda (-r)' option, and the number of classes can be manually specified  with  the
       '--num_classes  (-c)'and  if  an  intercept  term  is not desired in the model, the '--no_intercept (-N)'
       parameter can be specified.Margin of difference between correct class and other classes can be  specified
       with  the  '--delta  (-d)'  option.The  optimizer  used  to  train  the  model  can be specified with the
       '--optimizer (-O)' parameter. Available options are 'psgd' (parallel  stochastic  gradient  descent)  and
       'lbfgs'   (the   L-BFGS   optimizer).   There   are  also  various  parameters  for  the  optimizer;  the
       '--max_iterations  (-n)'  parameter  specifies  the  maximum  number  of  allowed  iterations,  and   the
       '--tolerance (-e)' parameter specifies the tolerance for convergence. For the parallel SGD optimizer, the
       ’--step_size  (-a)'  parameter  controls  the  step size taken at each iteration by the optimizer and the
       maximum number of epochs (specified with '--epochs (-E)'). If the objective function  for  your  data  is
       oscillating  between  Inf  and  0, the step size is probably too large. There are more parameters for the
       optimizers, but the C++ interface must be used to access these.

       Optionally, the model can be  used  to  predict  the  labels  for  another  matrix  of  data  points,  if
       '--test_file  (-T)'  is  specified.  The  '--test_file  (-T)'  parameter  can  be  specified  without the
       '--training_file  (-t)'  parameter,  so  long  as  an  existing  linear  SVM  model  is  given  with  the
       '--input_model_file  (-m)'  parameter. The output predictions from the linear SVM model may be saved with
       the '--predictions_file (-P)' parameter.

       As an example, to train a  LinaerSVM  on  the  data  ''data.csv''  with  labels  ’'labels.csv''  with  L2
       regularization of 0.1, saving the model to ’'lsvm_model.bin'', the following command may be used:

       $   mlpack_linear_svm   --training_file   data.csv   --labels_file  labels.csv  --lambda  0.1  --delta  1
       --num_classes 0 --output_model_file lsvm_model.bin

       Then, to use that model to predict classes for the dataset ''test.csv'', storing the  output  predictions
       in ''predictions.csv'', the following command may be used:

       $   mlpack_linear_svm   --input_model_file   lsvm_model.bin   --test_file   test.csv   --predictions_file
       predictions.csv

OPTIONAL INPUT OPTIONS

       --delta (-d) [double]
              Margin of difference between correct class and other classes. Default value 1.

       --epochs (-E) [int]
              Maximum number of full epochs over dataset for psgd Default value 50.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Existing model (parameters).

       --labels_file (-l) [string]
              A matrix containing labels (0 or 1) for the points in the training set (y).

       --lambda (-r) [double]
              L2-regularization parameter for training.  Default value 0.0001.

       --max_iterations (-n) [int]
              Maximum iterations for optimizer (0 indicates no limit). Default value 10000.

       --no_intercept (-N) [bool]
              Do not add the intercept term to the model.

       --num_classes (-c) [int]
              Number of classes for classification; if unspecified (or 0), the number of classes  found  in  the
              labels will be used. Default value 0.

       --optimizer (-O) [string]
              Optimizer to use for training ('lbfgs' or 'psgd'). Default value 'lbfgs'.

       --seed (-s) [int]
              Random seed. If 0, 'std::time(NULL)' is used.  Default value 0.

       --shuffle (-S) [bool]
              Don't shuffle the order in which data points are visited for parallel SGD.

       --step_size (-a) [double]
              Step size for parallel SGD optimizer. Default value 0.01.

       --test_file (-T) [string]
              Matrix containing test dataset.

       --test_labels_file (-L) [string]
              Matrix containing test labels.

       --tolerance (-e) [double]
              Convergence tolerance for optimizer. Default value 1e-10.

       --training_file (-t) [string]
              A matrix containing the training set (the matrix of predictors, X).

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_model_file (-M) [unknown]
              Output for trained linear svm model.

       --predictions_file (-P) [string]
              If test data is specified, this matrix is where the predictions for the test set will be saved.

       --probabilities_file (-p) [string]
              If  test  data is specified, this matrix is where the class probabilities for the test set will be
              saved.

ADDITIONAL INFORMATION

       For further information, including relevant papers, citations,  and  theory,  consult  the  documentation
       found at http://www.mlpack.org or included with your distribution of mlpack.

mlpack-3.2.2                                    21 February 2020                            mlpack_linear_svm(1)