Provided by: mlpack-bin_2.2.5-1build1_amd64 bug

NAME

       mlpack_preprocess_split - split data

SYNOPSIS

        mlpack_preprocess_split [-h] [-v]

DESCRIPTION

       This utility takes a dataset and optionally labels and splits them into a training set and
       a test set. Before the split, the points  in  the  dataset  are  randomly  reordered.  The
       percentage  of  the  dataset  to  be  used  as  the  test  set  can  be specified with the
       --test_ratio (-r) option; the default is 0.2 (20%).

       The program does not modify the original file, but instead makes separate  files  to  save
       the  training  and  test  files;  The  program requires you to specify the file names with
       --training_file (-t) and --test_file (-T).

       Optionally,  labels  can  be  also  be  split  along  with  the  data  by  specifying  the
       --input_labels_file  (-I)  option.  Splitting  labels  works the same way as splitting the
       data. The output training and test  labels  will  be  saved  to  the  files  specified  by
       --training_labels_file (-l) and --test_labels_file (-L), respectively.

       So,  a  simple example where we want to split dataset.csv into train.csv and test.csv with
       60% of the data in the training set and 40% of the dataset in the test set, we could run

       $ mlpack_preprocess_split -i dataset.csv -t train.csv -T test.csv -r 0.4

       If we had a dataset in dataset.csv and associated labels in labels.csv, and we  wanted  to
       split these into training_set.csv, training_labels.csv, test_set.csv, and test_labels.csv,
       with 30% of the data in the test set, we could run

       $ mlpack_preprocess_split -i dataset.csv -I labels.csv -r 0.3  >  -t  training_set.csv  -l
       training_labels.csv -T test_set.csv > -L test_labels.csv

REQUIRED INPUT OPTIONS

       --input_file (-i) [string]
              File containing data,

OPTIONAL INPUT OPTIONS

       --help (-h)
              Default help info.

       --info [string]
              Get  help  on  a specific module or option.  Default value ''.  --input_labels_file
              (-I) [string] File containing labels Default value ''.

       --seed (-s) [int]
              Random seed (0 for std::time(NULL)). Default value 0.

       --test_ratio (-r) [double]
              Ratio of test set; if not set,the ratio defaults to 0.2 Default value 0.2.

       --verbose (-v)
              Display informational messages and the full list of parameters and  timers  at  the
              end of execution.

       --version (-V)
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --test_file (-T) [string]
              File  name  to  save  test data Default value ''.  --test_labels_file (-L) [string]
              File name to save test label Default value ''.  --training_file (-t) [string]  File
              name  to  save  train  data Default value ''.  --training_labels_file (-l) [string]
              File name to save train label Default value ’'.

ADDITIONAL INFORMATION

ADDITIONAL INFORMATION

       For further information, including relevant papers, citations,  and  theory,  For  further
       information,  including  relevant papers, citations, and theory, consult the documentation
       found at http://www.mlpack.org or included with your consult the  documentation  found  at
       http://www.mlpack.org  or  included  with  your  DISTRIBUTION  OF MLPACK.  DISTRIBUTION OF
       MLPACK.

                                                        mlpack_preprocess_split(16 November 2017)