Provided by: vowpal-wabbit_8.6.1.dfsg1-1build3_amd64
NAME
vw - Vowpal Wabbit -- fast online learning tool
DESCRIPTION
VW options: --ring_size arg size of example ring --onethread Disable parse thread Update options: -l [ --learning_rate ] arg Set learning rate --power_t arg t power value --decay_learning_rate arg Set Decay factor for learning_rate between passes --initial_t arg initial t value --feature_mask arg Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights. Weight options: -i [ --initial_regressor ] arg Initial regressor(s) --initial_weight arg Set all weights to an initial value of arg. --random_weights arg make initial weights random --normal_weights arg make initial weights normal --truncated_normal_weights arg make initial weights truncated normal --sparse_weights Use a sparse datastructure for weights --input_feature_regularizer arg Per feature regularization input file Parallelization options: --span_server arg Location of server for setting up spanning tree --threads Enable multi-threading --unique_id arg (=0) unique id used for cluster parallel jobs --total arg (=1) total number of nodes used in cluster parallel job --node arg (=0) node number in cluster parallel job Diagnostic options: --version Version information -a [ --audit ] print weights of features -P [ --progress ] arg Progress update frequency. int: additive, float: multiplicative --quiet Don't output disgnostics and progress updates -h [ --help ] Look here: http://hunch.net/~vw/ and click on Tutorial. Random Seed option: --random_seed arg seed random number generator Feature options: --hash arg how to hash the features. Available options: strings, all --hash_seed arg (=0) seed for hash function --ignore arg ignore namespaces beginning with character <arg> --ignore_linear arg ignore namespaces beginning with character <arg> for linear terms only --keep arg keep namespaces beginning with character <arg> --redefine arg redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form 'N:=S' where := is operator. Empty N or S are treated as default namespace. Use ':' as a wildcard in S. -b [ --bit_precision ] arg number of bits in the feature table --noconstant Don't add a constant feature -C [ --constant ] arg Set initial value of constant --ngram arg Generate N grams. To generate N grams for a single namespace 'foo', arg should be fN. --skips arg Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace 'foo', arg should be fN. --feature_limit arg limit to N features. To apply to a single namespace 'foo', arg should be fN --affix arg generate prefixes/suffixes of features; argument '+2a,-3b,+1' means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace --spelling arg compute spelling features for a give namespace (use '_' for default namespace) --dictionary arg read a dictionary for additional features (arg either 'x:file' or just 'file') --dictionary_path arg look in this directory for dictionaries; defaults to current directory or env{PATH} --interactions arg Create feature interactions of any level between namespaces. --permutations Use permutations instead of combinations for feature interactions of same namespace. --leave_duplicate_interactions Don't remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: '-q ab -q ba' and a lot more in '-q ::'. -q [ --quadratic ] arg Create and use quadratic features --q: arg : corresponds to a wildcard for all printable characters --cubic arg Create and use cubic features Example options: -t [ --testonly ] Ignore label information and just test --holdout_off no holdout data in multiple passes --holdout_period arg (=10) holdout period for test only --holdout_after arg holdout after n training examples, default off (disables holdout_period) --early_terminate arg (=3) Specify the number of passes tolerated when holdout loss doesn't decrease before early termination --passes arg Number of Training Passes --initial_pass_length arg initial number of examples per pass --examples arg number of examples to parse --min_prediction arg Smallest prediction to output --max_prediction arg Largest prediction to output --sort_features turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes --loss_function arg (=squared) Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic, quantile and poisson. --quantile_tau arg (=0.5) Parameter \tau associated with Quantile loss. Defaults to 0.5 --l1 arg l_1 lambda --l2 arg l_2 lambda --no_bias_regularization arg no bias in regularization --named_labels arg use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg "--named_labels Noun,Verb,Adj,Punc" Output model: -f [ --final_regressor ] arg Final regressor --readable_model arg Output human-readable final regressor with numeric features --invert_hash arg Output human-readable final regressor with feature names. Computationally expensive. --save_resume save extra state so learning can be resumed later with new data --preserve_performance_counters reset performance counters when warmstarting --save_per_pass Save the model after every pass over data --output_feature_regularizer_binary arg Per feature regularization output file --output_feature_regularizer_text arg Per feature regularization output file, in text --id arg User supplied ID embedded into the final regressor Output options: -p [ --predictions ] arg File to output predictions to -r [ --raw_predictions ] arg File to output unnormalized predictions to Audit Regressor: --audit_regressor arg stores feature names and their regressor values. Same dataset must be used for both regressor training and this mode. Search options: --search arg Use learning to search, argument=maximum action id or 0 for LDF --search_task arg the search task (use "--search_task list" to get a list of available tasks) --search_metatask arg the search metatask (use "--search_metatask list" to get a list of available metatasks) --search_interpolation arg at what level should interpolation happen? [*data|policy] --search_rollout arg how should rollouts be executed? [policy|oracle|*mix_per_state|mix_p er_roll|none] --search_rollin arg how should past trajectories be generated? [policy|oracle|*mix_per_stat e|mix_per_roll] --search_passes_per_policy arg (=1) number of passes per policy (only valid for search_interpolation=policy) --search_beta arg (=0.5) interpolation rate for policies (only valid for search_interpolation=policy) --search_alpha arg (=1.00000001e-10) annealed beta = 1-(1-alpha)^t (only valid for search_interpolation=data) --search_total_nb_policies arg if we are going to train the policies through multiple separate calls to vw, we need to specify this parameter and tell vw how many policies are eventually going to be trained --search_trained_nb_policies arg the number of trained policies in a file --search_allowed_transitions arg read file of allowed transitions [def: all transitions are allowed] --search_subsample_time arg instead of training at all timesteps, use a subset. if value in (0,1), train on a random v%. if v>=1, train on precisely v steps per example, if v<=-1, use active learning --search_neighbor_features arg copy features from neighboring lines. argument looks like: '-1:a,+2' meaning copy previous line namespace a and next next line from namespace _unnamed_, where ',' separates them --search_rollout_num_steps arg how many calls of "loss" before we stop really predicting on rollouts and switch to oracle (default means "infinite") --search_history_length arg (=1) some tasks allow you to specify how much history their depend on; specify that here --search_no_caching turn off the built-in caching ability (makes things slower, but technically more safe) --search_xv train two separate policies, alternating prediction/learning --search_perturb_oracle arg (=0) perturb the oracle on rollin with this probability --search_linear_ordering insist on generating examples in linear order (def: hoopla permutation) --search_active_verify arg verify that active learning is doing the right thing (arg = multiplier, should be = cost_range * range_c) --search_save_every_k_runs arg save model every k runs Experience Replay: --replay_c arg use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size --replay_c_count arg (=1) how many times (in expectation) should each example be played (default: 1 = permuting) Explore evaluation: --explore_eval Evaluate explore_eval adf policies --multiplier arg Multiplier used to make all rejection sample probabilities <= 1 Make Multiclass into Contextual Bandit: --cbify arg Convert multiclass on <k> classes into a contextual bandit problem --cbify_cs consume cost-sensitive classification examples instead of multiclass --loss0 arg (=0) loss for correct label --loss1 arg (=1) loss for incorrect label Contextual Bandit Exploration with Action Dependent Features: --cb_explore_adf Online explore-exploit for a contextual bandit problem with multiline action dependent features --first arg tau-first exploration --epsilon arg epsilon-greedy exploration --bag arg bagging-based exploration --cover arg Online cover based exploration --psi arg (=1) disagreement parameter for cover --nounif do not explore uniformly on zero-probability actions in cover --softmax softmax exploration --regcb RegCB-elim exploration --regcbopt RegCB optimistic exploration --mellowness arg (=0.100000001) RegCB mellowness parameter c_0. Default 0.1 --greedify always update first policy once in bagging --cb_min_cost arg (=0) lower bound on cost --cb_max_cost arg (=1) upper bound on cost --first_only Only explore the first action in a tie-breaking event --lambda arg (=-1) parameter for softmax Contextual Bandit Exploration: --cb_explore arg Online explore-exploit for a <k> action contextual bandit problem --first arg tau-first exploration --epsilon arg (=0.0500000007) epsilon-greedy exploration --bag arg bagging-based exploration --cover arg Online cover based exploration --psi arg (=1) disagreement parameter for cover Multiworld Testing Options: --multiworld_test arg Evaluate features as a policies --learn arg Do Contextual Bandit learning on <n> classes. --exclude_eval Discard mwt policy features before learning Contextual Bandit with Action Dependent Features: --cb_adf Do Contextual Bandit learning with multiline action dependent features. --rank_all Return actions sorted by score order --no_predict Do not do a prediction when training --cb_type arg (=ips) contextual bandit method to use in {ips,dm,dr, mtr} Contextual Bandit Options: --cb arg Use contextual bandit learning with <k> costs --cb_type arg (=dr) contextual bandit method to use in {ips,dm,dr} --eval Evaluate a policy rather than optimizing. Cost Sensitive One Against All with Label Dependent Features: --csoaa_ldf arg Use one-against-all multiclass learning with label dependent features. --ldf_override arg Override singleline or multiline from csoaa_ldf or wap_ldf, eg if stored in file --csoaa_rank Return actions sorted by score order --probabilities predict probabilites of all classes --wap_ldf arg Use weighted all-pairs multiclass learning with label dependent features. Specify singleline or multiline. Interact via elementwise multiplication: --interact arg Put weights on feature products from namespaces <n1> and <n2> Cost Sensitive One Against All: --csoaa arg One-against-all multiclass with <k> costs Cost-sensitive Active Learning: --cs_active arg Cost-sensitive active learning with <k> costs --simulation cost-sensitive active learning simulation mode --baseline cost-sensitive active learning baseline --domination cost-sensitive active learning use domination. Default 1 --mellowness arg (=0.100000001) mellowness parameter c_0. Default 0.1. --range_c arg (=0.5) parameter controlling the threshold for per-label cost uncertainty. Default 0.5. --max_labels arg (=18446744073709551615) maximum number of label queries. --min_labels arg (=18446744073709551615) minimum number of label queries. --cost_max arg (=1) cost upper bound. Default 1. --cost_min arg (=0) cost lower bound. Default 0. --csa_debug print debug stuff for cs_active Multilabel One Against All: --multilabel_oaa arg One-against-all multilabel with <k> labels importance weight classes: --classweight arg importance weight multiplier for class Recall Tree: --recall_tree arg Use online tree for multiclass --max_candidates arg maximum number of labels per leaf in the tree --bern_hyper arg (=1) recall tree depth penalty --max_depth arg maximum depth of the tree, default log_2 (#classes) --node_only arg (=0) only use node features, not full path features --randomized_routing arg (=0) randomized routing Logarithmic Time Multiclass Tree: --log_multi arg Use online tree for multiclass --no_progress disable progressive validation --swap_resistance arg (=4) higher = more resistance to swap, default=4 Error Correcting Tournament Options: --ect arg Error correcting tournament with <k> labels --error arg (=0) errors allowed by ECT Boosting: --boosting arg Online boosting with <N> weak learners --gamma arg (=0.100000001) weak learner's edge (=0.1), used only by online BBM --alg arg (=BBM) specify the boosting algorithm: BBM (default), logistic (AdaBoost.OL.W), adaptive (AdaBoost.OL) One Against All Options: --oaa arg One-against-all multiclass with <k> labels --oaa_subsample arg subsample this number of negative examples when learning --probabilities predict probabilites of all classes --scores output raw scores per class Top K: --top arg top k recommendation Experience Replay: --replay_m arg use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size --replay_m_count arg (=1) how many times (in expectation) should each example be played (default: 1 = permuting) Binary loss: --binary report loss as binary classification on -1,1 Bootstrap: --bootstrap arg k-way bootstrap by online importance resampling --bs_type arg prediction type {mean,vote} scorer options: --link arg (=identity) Specify the link function: identity, logistic, glf1 or poisson Stagewise polynomial options: --stage_poly use stagewise polynomial feature learning --sched_exponent arg (=1) exponent controlling quantity of included features --batch_sz arg (=1000) multiplier on batch size before including more features --batch_sz_no_doubling batch_sz does not double Low Rank Quadratics FA: --lrqfa arg use low rank quadratic features with field aware weights Low Rank Quadratics: --lrq arg use low rank quadratic features --lrqdropout use dropout training for low rank quadratic features Autolink: --autolink arg create link function with polynomial d Marginal: --marginal arg substitute marginal label estimates for ids --initial_denominator arg (=1) initial denominator --initial_numerator arg (=0.5) initial numerator --compete enable competition with marginal features --update_before_learn arg (=0) update marginal values before learning --unweighted_marginals arg (=0) ignore importance weights when computing marginals --decay arg (=0) decay multiplier per event (1e-3 for example) Matrix Factorization Reduction: --new_mf arg rank for reduction-based matrix factorization Neural Network: --nn arg Sigmoidal feedforward network with <k> hidden units --inpass Train or test sigmoidal feedforward network with input passthrough. --multitask Share hidden layer across all reduced tasks. --dropout Train or test sigmoidal feedforward network using dropout. --meanfield Train or test sigmoidal feedforward network using mean field. Confidence: --confidence Get confidence for binary predictions --confidence_after_training Confidence after training Active Learning with Cover: --active_cover enable active learning with cover --mellowness arg (=8) active learning mellowness parameter c_0. Default 8. --alpha arg (=1) active learning variance upper bound parameter alpha. Default 1. --beta_scale arg (=3.1622777) active learning variance upper bound parameter beta_scale. Default sqrt(10). --cover arg (=12) cover size. Default 12. --oracular Use Oracular-CAL style query or not. Default false. Active Learning: --active enable active learning --simulation active learning simulation mode --mellowness arg (=8) active learning mellowness parameter c_0. Default 8 Experience Replay: --replay_b arg use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size --replay_b_count arg (=1) how many times (in expectation) should each example be played (default: 1 = permuting) Baseline options: --baseline Learn an additive baseline (from constant features) and a residual separately in regression. --lr_multiplier arg learning rate multiplier for baseline model --global_only use separate example with only global constant for baseline predictions --check_enabled only use baseline when the example contains enabled flag OjaNewton options: --OjaNewton Online Newton with Oja's Sketch --sketch_size arg (=10) size of sketch --epoch_size arg (=1) size of epoch --alpha arg (=1) mutiplicative constant for indentiy --alpha_inverse arg one over alpha, similar to learning rate --learning_rate_cnt arg (=2) constant for the learning rate 1/t --normalize arg (=1) normalize the features or not --random_init arg (=1) randomize initialization of Oja or not LBFGS and Conjugate Gradient options: --conjugate_gradient use conjugate gradient based optimization --bfgs use bfgs optimization --hessian_on use second derivative in line search --mem arg (=15) memory in bfgs --termination arg (=0.00100000005) Termination threshold Latent Dirichlet Allocation: --lda arg Run lda with <int> topics --lda_alpha arg (=0.100000001) Prior on sparsity of per-document topic weights --lda_rho arg (=0.100000001) Prior on sparsity of topic distributions --lda_D arg (=10000) Number of documents --lda_epsilon arg (=0.00100000005) Loop convergence threshold --minibatch arg (=1) Minibatch size, for LDA --math-mode arg (=0) Math mode: simd, accuracy, fast-approx --metrics arg (=0) Compute metrics Noop Learner: --noop do no learning Print psuedolearner: --print print examples Gradient Descent Matrix Factorization: --rank arg rank for matrix factorization. Network sending: --sendto arg send examples to <host> Stochastic Variance Reduced Gradient: --svrg Streaming Stochastic Variance Reduced Gradient --stage_size arg (=1) Number of passes per SVRG stage Follow the Regularized Leader: --ftrl FTRL: Follow the Proximal Regularized Leader --ftrl_alpha arg (=0.00499999989) Learning rate for FTRL optimization --ftrl_beta arg (=0.100000001) FTRL beta parameter --pistol FTRL: Parameter-free Stochastic Learning --ftrl_alpha arg (=1) Learning rate for FTRL optimization --ftrl_beta arg (=0.5) FTRL beta parameter Kernel SVM: --ksvm kernel svm --reprocess arg (=1) number of reprocess steps for LASVM --pool_greedy use greedy selection on mini pools --para_active do parallel active learning --pool_size arg (=1) size of pools for active learning --subsample arg (=1) number of items to subsample from the pool --kernel arg (=linear) type of kernel (rbf or linear (default)) --bandwidth arg (=1) bandwidth of rbf kernel --degree arg (=2) degree of poly kernel --lambda arg saving regularization for test time Gradient Descent options: --sgd use regular stochastic gradient descent update. --adaptive use adaptive, individual learning rates. --adax use adaptive learning rates with x^2 instead of g^2x^2 --invariant use safe/importance aware updates. --normalized use per feature normalized updates --sparse_l2 arg (=0) use per feature normalized updates --l1_state arg (=0) use per feature normalized updates --l2_state arg (=1) use per feature normalized updates Input options: -d [ --data ] arg Example Set --daemon persistent daemon mode on port 26542 --foreground in persistent daemon mode, do not run in the background --port arg port to listen on; use 0 to pick unused port --num_children arg number of children for persistent daemon mode --pid_file arg Write pid file in persistent daemon mode --port_file arg Write port used in persistent daemon mode -c [ --cache ] Use a cache. The default is <data>.cache --cache_file arg The location(s) of cache_file. --json Enable JSON parsing. --dsjson Enable Decision Service JSON parsing. -k [ --kill_cache ] do not reuse existing cache: create a new one always --compressed use gzip format whenever possible. If a cache file is being created, this option creates a compressed cache file. A mixture of raw-text & compressed inputs are supported with autodetection. --no_stdin do not default to reading from stdin