Ubuntu Manpage: mlpack_krann - k-rank-approximate-nearest-neighbors (krann)

Provided by: mlpack-bin_4.5.1-1build2_amd64

NAME

       mlpack_krann - k-rank-approximate-nearest-neighbors (krann)

SYNOPSIS

        mlpack_krann [-a double] [-X bool] [-m unknown] [-k int] [-l int] [-N bool] [-q unknown] [-R bool] [-r unknown] [-L bool] [-s int] [-S bool] [-z int] [-T double] [-t string] [-V bool] [-d unknown] [-n unknown] [-M unknown] [-h -v]

DESCRIPTION

       This  program will calculate the k rank-approximate-nearest-neighbors of a set of points. You may specify
       a separate set of reference points and query points, or just a reference set which will be used  as  both
       the  reference  and query set. You must specify the rank approximation (in %) (and optionally the success
       probability).

       For example, the following will return 5 neighbors from the top 0.1% of the data (with probability  0.95)
       for  each  point  in  'input.csv'  and  store  the  distances  in  'distances.csv'  and  the neighbors in
       'neighbors.csv.csv':

       $  mlpack_krann  --reference_file  input.csv  --k  5  --distances_file   distances.csv   --neighbors_file
       neighbors.csv --tau 0.1

       Note  that  tau must be set such that the number of points in the corresponding percentile of the data is
       greater than k. Thus, if we choose tau = 0.1 with a dataset of 1000  points  and  k  =  5,  then  we  are
       attempting  to  choose  5 nearest neighbors out of the closest 1 point -- this is invalid and the program
       will terminate with an error message.

       The output matrices are organized such that row i and column j in the neighbors output  file  corresponds
       to  the  index of the point in the reference set which is the i'th nearest neighbor from the point in the
       query set with index j. Row i and column j in the distances  output  file  corresponds  to  the  distance
       between those two points.

OPTIONAL INPUT OPTIONS

       --alpha (-a) [double]
              The desired success probability. Default value 0.95.

       --first_leaf_exact (-X) [bool]
              The flag to trigger sampling only after exactly exploring the first leaf.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Pre-trained kNN model.

       --k (-k) [int]
              Number of nearest neighbors to find. Default value 0.

       --leaf_size (-l) [int]
              Leaf  size  for  tree building (used for kd-trees, UB trees, R trees, R* trees, X trees, Hilbert R
              trees, R+ trees, R++ trees, and octrees).  Default value 20.

       --naive (-N) [bool]
              If true, sampling will be done without using a tree.

       --query_file (-q) [unknown]
              Matrix containing query points (optional).

       --random_basis (-R) [bool]
              Before tree-building, project the data onto a random orthogonal basis.

       --reference_file (-r) [unknown]
              Matrix containing the reference dataset.

       --sample_at_leaves (-L) [bool]
              The flag to trigger sampling at leaves.

       --seed (-s) [int]
              Random seed (if 0, std::time(NULL) is used).  Default value 0.

       --single_mode (-S) [bool]
              If true, single-tree search is used (as opposed to dual-tree search.

       --single_sample_limit (-z) [int]
              The limit on the maximum number of samples (and hence  the  largest  node  you  can  approximate).
              Default value 20.

       --tau (-T) [double]
              The allowed rank-error in terms of the percentile of the data. Default value 5.

       --tree_type (-t) [string]
              Type  of  tree  to  use:  'kd', 'ub', 'cover', 'r', 'x', 'r-star', 'hilbert-r', 'r-plus', 'r-plus-
              plus', 'oct'. Default value 'kd'.

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --distances_file (-d) [unknown]
              Matrix to output distances into.

       --neighbors_file (-n) [unknown]
              Matrix to output neighbors into.

       --output_model_file (-M) [unknown]
              If specified, the kNN model will be output here.

ADDITIONAL INFORMATION

       For further information, including relevant papers, citations,  and  theory,  consult  the  documentation
       found at http://www.mlpack.org or included with your distribution of mlpack.

mlpack-4.5.1                                     29 January 2025                                 mlpack_krann(1)