Ubuntu Manpage: tpot - Automated Machine Learning tool

Provided by: python3-tpot_0.11.7+dfsg-3ubuntu1_all

NAME

       tpot - Automated Machine Learning tool

DESCRIPTION

usage: tpot [-h] [-is INPUT_SEPARATOR] [-target TARGET_NAME]

[-mode {classification,regression}] [-o OUTPUT_FILE] [-g GENERATIONS] [-p
POPULATION_SIZE] [-os OFFSPRING_SIZE] [-mr MUTATION_RATE] [-xr CROSSOVER_RATE]
[-scoring SCORING_FN] [-cv NUM_CV_FOLDS] [-sub SUBSAMPLE] [-njobs NUM_JOBS]
[-maxtime MAX_TIME_MINS] [-maxeval MAX_EVAL_MINS] [-s RANDOM_STATE] [-config
CONFIG_FILE] [-template TEMPLATE] [-memory MEMORY] [-cf CHECKPOINT_FOLDER] [-es
EARLY_STOP] [-v {0,1,2,3}] [-log LOG] [--version] INPUT_FILE

A Python tool that automatically creates and optimizes machine learning pipelines using
genetic programming.

positional arguments:
INPUT_FILE
Data file to use in the TPOT optimization process. Ensure that the class label
column is labeled as "class".

options:
-h, --help
Show this help message and exit.

-is INPUT_SEPARATOR
Character used to separate columns in the input file.

-target TARGET_NAME
Name of the target column in the input file.

-mode {classification,regression}
Whether TPOT is being used for a supervised classification or regression problem.

-o OUTPUT_FILE
File to export the code for the final optimized pipeline.

-g GENERATIONS
Number of iterations to run the pipeline optimization process. It must be a
positive number or None. If None, the parameter max_time_mins must be defined as
the runtime limit. Generally, TPOT will work better when you give it more
generations (and therefore time) to optimize the pipeline. TPOT will evaluate
POPULATION_SIZE + GENERATIONS x OFFSPRING_SIZE pipelines in total.

-p POPULATION_SIZE
Number of individuals to retain in the GP population every generation. Generally,
TPOT will work better when you give it more individuals (and therefore time) to
optimize the pipeline. TPOT will evaluate POPULATION_SIZE + GENERATIONS x
OFFSPRING_SIZE pipelines in total.

-os OFFSPRING_SIZE
Number of offspring to produce in each GP generation. By default,OFFSPRING_SIZE =
POPULATION_SIZE.

-mr MUTATION_RATE
GP mutation rate in the range [0.0, 1.0]. This tells the GP algorithm how many
pipelines to apply random changes to every generation. We recommend using the
default parameter unless you understand how the mutation rate affects GP
algorithms.

-xr CROSSOVER_RATE
GP crossover rate in the range [0.0, 1.0]. This tells the GP algorithm how many
pipelines to "breed" every generation. We recommend using the default parameter
unless you understand how the crossover rate affects GP algorithms.

-scoring SCORING_FN
Function used to evaluate the quality of a given pipeline for the problem. By
default, accuracy is used for classification problems and mean squared error (mse)
is used for regression problems. Note: If you wrote your own function, set this
argument to mymodule.myfunctionand TPOT will import your module and take the
function from there.TPOT will assume the module can be imported from the current
workdir.TPOT assumes that any function with "error" or "loss" in the name is meant
to be minimized, whereas any other functions will be maximized. Offers the same
options as cross_val_score: accuracy, adjusted_rand_score, average_precision, f1,
f1_macro, f1_micro, f1_samples, f1_weighted, neg_log_loss, neg_mean_absolute_error,
neg_mean_squared_error, neg_median_absolute_error, precision, precision_macro,
precision_micro, precision_samples, precision_weighted, r2, recall, recall_macro,
recall_micro, recall_samples, recall_weighted, roc_auc

-cv NUM_CV_FOLDS
Number of folds to evaluate each pipeline over in stratified k-fold
cross-validation during the TPOT optimization process.

-sub SUBSAMPLE
Subsample ratio of the training instance. Setting it to 0.5 means that TPOT will
use a random subsample of half of training data for the pipeline optimization
process.

-njobs NUM_JOBS
Number of CPUs for evaluating pipelines in parallel during the TPOT optimization
process. Assigning this to -1 will use as many cores as available on the computer.
For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs
but one are used.

-maxtime MAX_TIME_MINS
How many minutes TPOT has to optimize the pipeline. If not None, this setting will
allow TPOT to run until max_time_mins minutes elapsed and then stop. TPOT will stop
earlier if generationsis set and all generations are already evaluated.

-maxeval MAX_EVAL_MINS
How many minutes TPOT has to evaluate a single pipeline. Setting this parameter to
higher values will allow TPOT to explore more complex pipelines but will also allow
TPOT to run longer.

-s RANDOM_STATE
Random number generator seed for reproducibility. Set this seed if you want your
TPOT run to be reproducible with the same seed and data set in the future.

-config CONFIG_FILE
Configuration file for customizing the operators and parameters that TPOT uses in
the optimization process. Must be a Python module containing a dict export named
"tpot_config" or the name of built-in configuration.

-template TEMPLATE
Template of predefined pipeline structure. The option is for specifying a desired
structurefor the machine learning pipeline evaluated in TPOT. So far this option
only supportslinear pipeline structure. Each step in the pipeline should be a main
class of operators(Selector, Transformer, Classifier or Regressor) or a specific
operator(e.g. SelectPercentile) defined in TPOT operator configuration. If one
step is a main class,TPOT will randomly assign all subclass operators (subclasses
of SelectorMixin,TransformerMixin, ClassifierMixin or RegressorMixin in
scikit-learn) to that step.Steps in the template are delimited by "-", e.g.
"SelectPercentile-Transformer-Classifier".By default value of template is None,
TPOT generates tree-based pipeline randomly.

-memory MEMORY
Path of a directory for pipeline caching or "auto" for using a temporary caching
directory during the optimization process. If supplied, pipelines will cache each
transformer after fitting them. This feature is used to avoid repeated computation
by transformers within a pipeline if the parameters and input data are identical
with another fitted pipeline during optimization process.

-cf CHECKPOINT_FOLDER
If supplied, a folder in which tpot will periodically save the best pipeline so far
while optimizing. This is useful in multiple cases: sudden death before tpot could
save an optimized pipeline, progress tracking, grabbing a pipeline while it's still
optimizing etc.

-es EARLY_STOP
How many generations TPOT checks whether there is no improvement in optimization
process. End optimization process if there is no improvement in the set number of
generations.

-v {0,1,2,3}
How much information TPOT communicates while it is running: 0 = none, 1 = minimal,
2 = high, 3 = all. A setting of 2 or higher will add a progress bar during the
optimization procedure.

-log LOG
Save progress content to a file

--version
Show the TPOT version number and exit.