Ubuntu Manpage: mlpack_allkrann - all k-rank-approximate-nearest-neighbors

name
synopsis
description
options
additional information
additional information

NAME

       mlpack_allkrann - all k-rank-approximate-nearest-neighbors

SYNOPSIS

        mlpack_allkrann [-h] [-v] [-a double] [-d string] [-X] [-m string] [-k int] [-l int] [-N] [-n string] [-M string] [-q string] [-R] [-r string] [-L] [--seed int] [-s] [-S int] [-t double] [--tree_type string] -V

DESCRIPTION

       This  program will calculate the k rank-approximate-nearest-neighbors of a set of points. You may specify
       a separate set of reference points and query points, or just a reference set which will be used  as  both
       the  reference  and query set. You must specify the rank approximation (in %) (and optionally the success
       probability).

       For example, the following will return 5 neighbors from the top 0.1% of the data (with probability  0.95)
       for  each  point  in 'input.csv' and store the distances in 'distances.csv' and the neighbors in the file
       'neighbors.csv':

       $ allkrann -k 5 -r input.csv -d distances.csv -n neighbors.csv --tau 0.1

       Note that tau must be set such that the number of points in the corresponding percentile of the  data  is
       greater  than  k.  Thus,  if  we  choose  tau  = 0.1 with a dataset of 1000 points and k = 5, then we are
       attempting to choose 5 nearest neighbors out of the closest 1 point -- this is invalid  and  the  program
       will terminate with an error message.

       The  output  files are organized such that row i and column j in the neighbors output file corresponds to
       the index of the point in the reference set which is the i'th nearest neighbor  from  the  point  in  the
       query  set  with  index  j.   Row i and column j in the distances output file corresponds to the distance
       between those two points.

OPTIONS

       --alpha (-a) [double]
              The desired success probability. Default value  0.95.   --distances_file  (-d)  [string]  File  to
              output distances into. Default value ’'.

       --first_leaf_exact (-X)
              The flag to trigger sampling only after exactly exploring the first leaf.

       --help (-h)
              Default help info.

       --info [string]
              Get help on a specific module or option.  Default value ''.  --input_model_file (-m) [string] File
              containing pre-trained kNN model. Default value ''.

       --k (-k) [int]
              Number of nearest neighbors to find. Default value 0.

       --leaf_size (-l) [int]
              Leaf size for tree building (used for kd-trees, R trees, and R* trees). Default value 20.

       --naive (-N)
              If true, sampling will be done without using a  tree.   --neighbors_file  (-n)  [string]  File  to
              output  neighbors into. Default value ’'.  --output_model_file (-M) [string] If specified, the kNN
              model will be saved to the given file. Default value ''.

       --query_file (-q) [string]
              File containing query points (optional).  Default value ''.

       --random_basis (-R)
              Before tree-building, project the data onto a  random  orthogonal  basis.   --reference_file  (-r)
              [string] File containing the reference dataset. Default value ''.

       --sample_at_leaves (-L)
              The flag to trigger sampling at leaves.

       --seed [int]
              Random seed (if 0, std::time(NULL) is used).  Default value 0.

       --single_mode (-s)
              If  true,  single-tree search is used (as opposed to dual-tree search.  --single_sample_limit (-S)
              [int] The limit on the maximum number of samples (and hence the largest node you can approximate).
              Default value 20.

       --tau (-t) [double]
              The allowed rank-error in terms of the percentile of the data. Default value 5.

       --tree_type [string]
              Type of tree to use: 'kd', 'cover', 'r', or ’r-star'. Default value 'kd'.

       --verbose (-v)
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V)
              Display the version of mlpack.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant  papers,  citations, and theory, For further information,
       including   relevant   papers,   citations,   and   theory,   consult   the   documentation   found    at
       http://www.mlpack.org  or  included with your consult the documentation found at http://www.mlpack.org or
       included with your DISTRIBUTION OF MLPACK.  DISTRIBUTION OF MLPACK.

                                                                                              mlpack_allkrann(1)