lunar (1) mlpack_lsh.1.gz

Provided by: mlpack-bin_3.4.2-7ubuntu1_amd64 bug

NAME

       mlpack_lsh - k-approximate-nearest-neighbor search with lsh

SYNOPSIS

        mlpack_lsh [-B int] [-H double] [-m unknown] [-k int] [-T int] [-K int] [-q string] [-r string] [-S int] [-s int] [-L int] [-t string] [-V bool] [-d string] [-n string] [-M unknown] [-h -v]

DESCRIPTION

       This  program  will calculate the k approximate-nearest-neighbors of a set of points using
       locality-sensitive hashing. You may specify a separate set of reference points  and  query
       points, or just a reference set which will be used as both the reference and query set.

       For  example,  the  following  will  return  5  neighbors  from the data for each point in
       'input.csv'  and  store  the  distances  in   'distances.csv'   and   the   neighbors   in
       'neighbors.csv':

       $   mlpack_lsh   --k   5   --reference_file   input.csv   --distances_file   distances.csv
       --neighbors_file neighbors.csv

       The output is organized such that row i and column j in the neighbors  output  corresponds
       to the index of the point in the reference set which is the j'th nearest neighbor from the
       point in the query set with index i. Row j and column  i  in  the  distances  output  file
       corresponds to the distance between those two points.

       Because this is approximate-nearest-neighbors search, results may be different from run to
       run. Thus, the '--seed (-s)' parameter can be specified to set the random seed.

       This program also has  many  other  parameters  to  control  its  functionality;  see  the
       parameter-specific documentation for more information.

OPTIONAL INPUT OPTIONS

       --bucket_size (-B) [int]
              The size of a bucket in the second level hash.  Default value 500.

       --hash_width (-H) [double]
              The  hash  width  for the first-level hashing in the LSH preprocessing. By default,
              the LSH class automatically estimates a hash width for its use. Default value 0.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              Input LSH model.

       --k (-k) [int]
              Number of nearest neighbors to find. Default value 0.

       --num_probes (-T) [int]
              Number of additional probes for multiprobe LSH; if  0,  traditional  LSH  is  used.
              Default value 0.

       --projections (-K) [int]
              The number of hash functions for each table  Default value 10.

       --query_file (-q) [string]
              Matrix containing query points (optional).

       --reference_file (-r) [string]
              Matrix containing the reference dataset.

       --second_hash_size (-S) [int]
              The size of the second level hash table.  Default value 99901.

       --seed (-s) [int]
              Random seed. If 0, 'std::time(NULL)' is used.  Default value 0.

       --tables (-L) [int]
              The number of hash tables to be used. Default value 30.

       --true_neighbors_file (-t) [string]
              Matrix  of  true neighbors to compute recall with (the recall is printed when -v is
              specified).

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and  timers  at  the
              end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --distances_file (-d) [string]
              Matrix to output distances into.

       --neighbors_file (-n) [string]
              Matrix to output neighbors into.

       --output_model_file (-M) [unknown]
              Output for trained LSH model.

ADDITIONAL INFORMATION

       For  further  information,  including  relevant papers, citations, and theory, consult the
       documentation found at http://www.mlpack.org or included with your distribution of mlpack.