lunar (1) mlpack_gmm_train.1.gz

Provided by: mlpack-bin_3.4.2-7ubuntu1_amd64 bug

NAME

       mlpack_gmm_train - gaussian mixture model (gmm) training

SYNOPSIS

        mlpack_gmm_train -g int -i string [-d bool] [-m unknown] [-k int] [-n int] [-P bool] [-N double] [-p double] [-r bool] [-S int] [-s int] [-T double] [-t int] [-V bool] [-M unknown] [-h -v]

DESCRIPTION

       This  program  takes  a parametric estimate of a Gaussian mixture model (GMM) using the EM
       algorithm to find the maximum likelihood estimate. The model may be saved  and  reused  by
       other mlpack GMM tools.

       The  input  data to train on must be specified with the '--input_file (-i)' parameter, and
       the number of Gaussians in the  model  must  be  specified  with  the  ’--gaussians  (-g)'
       parameter.  Optionally,  many trials with different random initializations may be run, and
       the result with highest log-likelihood on the training data will be taken. The  number  of
       trials  to run is specified with the '--trials (-t)' parameter. By default, only one trial
       is run.

       The tolerance for convergence and maximum number of iterations of  the  EM  algorithm  are
       specified   with   the   '--tolerance   (-T)'   and  '--max_iterations  (-n)'  parameters,
       respectively. The GMM may be initialized for training with another model,  specified  with
       the  '--input_model_file  (-m)' parameter.  Otherwise, the model is initialized by running
       k-means on the data. The k-means clustering initialization  can  be  controlled  with  the
       ’--kmeans_max_iterations   (-k)',   '--refined_start   (-r)',   '--samplings   (-S)',  and
       '--percentage (-p)' parameters. If '--refined_start (-r)' is specified, then the  Bradley-
       Fayyad refined start initialization will be used. This can often lead to better clustering
       results.

       The 'diagonal_covariance' flag will cause the learned covariances to be diagonal matrices.
       This  significantly  simplifies  the  model  itself  and causes training to be faster, but
       restricts the ability to fit more complex GMMs.

       If GMM training fails with an error indicating that  a  covariance  matrix  could  not  be
       inverted,  make  sure  that  the  '--no_force_positive  (-P)'  parameter is not specified.
       Alternately, adding a small amount of Gaussian noise (using the '--noise (-N)'  parameter)
       to  the  entire  dataset  may  help  prevent  Gaussians with zero variance in a particular
       dimension, which is usually the cause of non-invertible covariance matrices.

       The '--no_force_positive (-P)' parameter,  if  set,  will  avoid  the  checks  after  each
       iteration  of  the  EM  algorithm  which  ensure that the covariance matrices are positive
       definite. Specifying the flag can cause faster runtime, but may  also  cause  non-positive
       definite covariance matrices, which will cause the program to crash.

       As  an  example, to train a 6-Gaussian GMM on the data in 'data.csv' with a maximum of 100
       iterations of EM and 3 trials, saving the trained GMM to ’gmm.bin', the following  command
       can be used:

       $  mlpack_gmm_train  --input_file  data.csv  --gaussians  6 --trials 3 --output_model_file
       gmm.bin

       To re-train that GMM on another set of data 'data2.csv',  the  following  command  may  be
       used:

       $   mlpack_gmm_train  --input_model_file  gmm.bin  --input_file  data2.csv  --gaussians  6
       --output_model_file new_gmm.bin

REQUIRED INPUT OPTIONS

       --gaussians (-g) [int]
              Number of Gaussians in the GMM.

       --input_file (-i) [string]
              The training data on which the model will be fit.

OPTIONAL INPUT OPTIONS

       --diagonal_covariance (-d) [bool]
              Force the covariance of the Gaussians to be diagonal. This can accelerate  training
              time significantly.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Initial input GMM model to start training with.

       --kmeans_max_iterations (-k) [int]
              Maximum  number  of  iterations  for the k-means algorithm (used to initialize EM).
              Default value 1000.

       --max_iterations (-n) [int]
              Maximum  number  of  iterations  of  EM  algorithm  (passing  0  will   run   until
              convergence). Default value 250.

       --no_force_positive (-P) [bool]
              Do not force the covariance matrices to be positive definite.

       --noise (-N) [double]
              Variance of zero-mean Gaussian noise to add to data. Default value 0.

       --percentage (-p) [double]
              If  using  --refined_start,  specify  the  percentage  of the dataset used for each
              sampling (should be between 0.0 and 1.0). Default value 0.02.

       --refined_start (-r) [bool]
              During the initialization, use refined initial  positions  for  k-means  clustering
              (Bradley and Fayyad, 1998).

       --samplings (-S) [int]
              If  using --refined_start, specify the number of samplings used for initial points.
              Default value 100.

       --seed (-s) [int]
              Random seed. If 0, 'std::time(NULL)' is used.  Default value 0.

       --tolerance (-T) [double]
              Tolerance for convergence of EM. Default value 1e-10.

       --trials (-t) [int]
              Number of trials to perform in training GMM.  Default value 1.

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and  timers  at  the
              end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --output_model_file (-M) [unknown]
              Output for trained GMM model.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant papers, citations, and theory, consult the
       documentation found at http://www.mlpack.org or included with your distribution of mlpack.