Ubuntu Manpage: mlpack_krann - k-rank-approximate-nearest-neighbors (krann)

NAME

       mlpack_krann - k-rank-approximate-nearest-neighbors (krann)

SYNOPSIS

        mlpack_krann [-a double] [-X bool] [-m unknown] [-k int] [-l int] [-N bool] [-q string] [-R bool] [-r string] [-L bool] [-s int] [-S bool] [-z int] [-T double] [-t string] [-V bool] [-d string] [-n string] [-M unknown] [-h -v]

DESCRIPTION

       This  program  will calculate the k rank-approximate-nearest-neighbors of a set of points.
       You may specify a separate set of reference points and query points, or just  a  reference
       set  which  will  be  used  as both the reference and query set. You must specify the rank
       approximation (in %) (and optionally the success probability).

       For example, the following will return 5 neighbors from the top 0.1%  of  the  data  (with
       probability 0.95) for each point in 'input.csv' and store the distances in 'distances.csv'
       and the neighbors in 'neighbors.csv.csv':

       $  mlpack_krann  --reference_file   input.csv   --k   5   --distances_file   distances.csv
       --neighbors_file neighbors.csv --tau 0.1

       Note  that  tau must be set such that the number of points in the corresponding percentile
       of the data is greater than k. Thus, if we choose tau = 0.1 with a dataset of 1000  points
       and k = 5, then we are attempting to choose 5 nearest neighbors out of the closest 1 point
       -- this is invalid and the program will terminate with an error message.

       The output matrices are organized such that row i and column j  in  the  neighbors  output
       file  corresponds to the index of the point in the reference set which is the i'th nearest
       neighbor from the point in the query set with index j. Row i and column j in the distances
       output file corresponds to the distance between those two points.

OPTIONAL INPUT OPTIONS

       --alpha (-a) [double]
              The desired success probability. Default value 0.95.

       --first_leaf_exact (-X) [bool]
              The flag to trigger sampling only after exactly exploring the first leaf.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Pre-trained kNN model.

       --k (-k) [int]
              Number of nearest neighbors to find. Default value 0.

       --leaf_size (-l) [int]
              Leaf  size  for  tree  building  (used for kd-trees, UB trees, R trees, R* trees, X
              trees, Hilbert R trees, R+ trees, R++ trees, and octrees).  Default value 20.

       --naive (-N) [bool]
              If true, sampling will be done without using a tree.

       --query_file (-q) [string]
              Matrix containing query points (optional).

       --random_basis (-R) [bool]
              Before tree-building, project the data onto a random orthogonal basis.

       --reference_file (-r) [string]
              Matrix containing the reference dataset.

       --sample_at_leaves (-L) [bool]
              The flag to trigger sampling at leaves.

       --seed (-s) [int]
              Random seed (if 0, std::time(NULL) is used).  Default value 0.

       --single_mode (-S) [bool]
              If true, single-tree search is used (as opposed to dual-tree search.

       --single_sample_limit (-z) [int]
              The limit on the maximum number of samples (and hence  the  largest  node  you  can
              approximate).  Default value 20.

       --tau (-T) [double]
              The allowed rank-error in terms of the percentile of the data. Default value 5.

       --tree_type (-t) [string]
              Type  of  tree  to  use:  'kd', 'ub', 'cover', 'r', 'x', 'r-star', 'hilbert-r', 'r-
              plus', 'r-plus-plus', 'oct'. Default value 'kd'.

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and  timers  at  the
              end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --distances_file (-d) [string]
              Matrix to output distances into.

       --neighbors_file (-n) [string]
              Matrix to output neighbors into.

       --output_model_file (-M) [unknown]
              If specified, the kNN model will be output here.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant papers, citations, and theory, consult the
       documentation found at http://www.mlpack.org or included with your distribution of mlpack.