bionic (1) mlpack_preprocess_split.1.gz

Provided by: mlpack-bin_2.2.5-1build1_amd64 bug

NAME

       mlpack_preprocess_split - split data

SYNOPSIS

        mlpack_preprocess_split [-h] [-v]

DESCRIPTION

       This  utility  takes  a dataset and optionally labels and splits them into a training set and a test set.
       Before the split, the points in the dataset are randomly reordered. The percentage of the dataset  to  be
       used as the test set can be specified with the --test_ratio (-r) option; the default is 0.2 (20%).

       The  program does not modify the original file, but instead makes separate files to save the training and
       test files; The program requires you to specify the file names with --training_file (-t) and  --test_file
       (-T).

       Optionally,  labels  can  be also be split along with the data by specifying the --input_labels_file (-I)
       option. Splitting labels works the same way as splitting the data. The output training  and  test  labels
       will  be  saved  to  the  files  specified  by  --training_labels_file  (-l) and --test_labels_file (-L),
       respectively.

       So, a simple example where we want to split dataset.csv into train.csv and test.csv with 60% of the  data
       in the training set and 40% of the dataset in the test set, we could run

       $ mlpack_preprocess_split -i dataset.csv -t train.csv -T test.csv -r 0.4

       If we had a dataset in dataset.csv and associated labels in labels.csv, and we wanted to split these into
       training_set.csv, training_labels.csv, test_set.csv, and test_labels.csv, with 30% of  the  data  in  the
       test set, we could run

       $   mlpack_preprocess_split   -i   dataset.csv   -I   labels.csv   -r   0.3   >  -t  training_set.csv  -l
       training_labels.csv -T test_set.csv > -L test_labels.csv

REQUIRED INPUT OPTIONS

       --input_file (-i) [string]
              File containing data,

OPTIONAL INPUT OPTIONS

       --help (-h)
              Default help info.

       --info [string]
              Get help on a specific module or option.  Default value  ''.   --input_labels_file  (-I)  [string]
              File containing labels Default value ''.

       --seed (-s) [int]
              Random seed (0 for std::time(NULL)). Default value 0.

       --test_ratio (-r) [double]
              Ratio of test set; if not set,the ratio defaults to 0.2 Default value 0.2.

       --verbose (-v)
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V)
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --test_file (-T) [string]
              File  name to save test data Default value ''.  --test_labels_file (-L) [string] File name to save
              test label Default value ''.  --training_file (-t) [string] File name to save train  data  Default
              value ''.  --training_labels_file (-l) [string] File name to save train label Default value ’'.

ADDITIONAL INFORMATION

ADDITIONAL INFORMATION

       For  further  information,  including  relevant  papers,  citations, and theory, For further information,
       including   relevant   papers,   citations,   and   theory,   consult   the   documentation   found    at
       http://www.mlpack.org  or  included with your consult the documentation found at http://www.mlpack.org or
       included with your DISTRIBUTION OF MLPACK.  DISTRIBUTION OF MLPACK.

                                                                       mlpack_preprocess_split(16 November 2017)