Provided by: mlpack-bin_3.0.4-1_amd64 bug


       mlpack_approx_kfn - approximate furthest neighbor search


        mlpack_approx_kfn [-a string] [-e bool] [-x string] [-m unknown] [-k int] [-p int] [-t int] [-q string] [-r string] [-V bool] [-d string] [-n string] [-M unknown] [-h -v]


       This program implements two strategies for furthest neighbor search. These strategies are:

              ·  The 'qdafn' algorithm from "Approximate Furthest Neighbor in High Dimensions" by
                 R. Pagh, F. Silvestri, J. Sivertsen, and M.  Skala,  in  Similarity  Search  and
                 Applications 2015 (SISAP).

              ·  The  'DrusillaSelect'  algorithm  from "Fast approximate furthest neighbors with
                 data-dependent candidate  selection",  by  R.R.  Curtin  and  A.B.  Gardner,  in
                 Similarity Search and Applications 2016 (SISAP).

       These two strategies give approximate results for the furthest neighbor search problem and
       can be used as fast replacements for other furthest  neighbor  techniques  such  as  those
       found  in  the  mlpack_kfn  program.  Note that typically, the 'ds' algorithm requires far
       fewer tables and projections than the 'qdafn' algorithm.

       Specify a reference set (set to search in) with '--reference_file (-r)', specify  a  query
       set  with  '--query_file  (-q)', and specify algorithm parameters with '--num_tables (-t)'
       and '--num_projections (-p)' (or don't and defaults will be used).  The  algorithm  to  be
       used  (either  'ds'---the  default---or ’qdafn') may be specified with '--algorithm (-a)'.
       Also specify the number of neighbors to search for with '--k (-k)'.

       If no query set is specified, the reference set will  be  used  as  the  query  set.   The
       '--output_model_file  (-M)'  output parameter may be used to store the built model, and an
       input  model  may  be  loaded  instead  of   specifying   a   reference   set   with   the
       '--input_model_file (-m)' option.

       Results  for  each  query  point  can  be  stored  with  the  '--neighbors_file  (-n)' and
       '--distances_file (-d)' output parameters. Each row of these output matrices holds  the  k
       distances or neighbor indices for each query point.

       For  example, to find the 5 approximate furthest neighbors with ’reference_set.csv' as the
       reference set and 'query_set.csv' as the  query  set  using  DrusillaSelect,  storing  the
       furthest  neighbor  indices  to  'neighbors.csv'  and  the  furthest neighbor distances to
       'distances.csv', one could call

       $  approx_kfn  --query_file  query_set.csv  --reference_file   reference_set.csv   --k   5
       --algorithm ds --neighbors_file neighbors.csv --distances_file distances.csv

       and  to  perform  approximate all-furthest-neighbors search with k=1 on the set ’data.csv'
       storing only the furthest neighbor distances to 'distances.csv', one could call

       $ approx_kfn --reference_file reference_set.csv --k 1 --distances_file distances.csv

       A trained model can be re-used. If a model has been previously saved to ’model.bin',  then
       we may find 3 approximate furthest neighbors on a query set ’new_query_set.csv' using that
       model and store the furthest neighbor indices into 'neighbors.csv' by calling

       $  approx_kfn  --input_model_file   model.bin   --query_file   new_query_set.csv   --k   3
       --neighbors_file neighbors.csv


       --algorithm (-a) [string]
              Algorithm to use: 'ds' or 'qdafn'. Default value 'ds'.

       --calculate_error (-e) [bool]
              If set, calculate the average distance error for the first furthest neighbor only.

       --exact_distances_file (-x) [string]
              Matrix  containing exact distances to furthest neighbors; this can be used to avoid

       calculation when --calculate_error is set.
              Default value ''.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Get help on a specific module or option.  Default value ''.

       --input_model_file (-m) [unknown]
              File containing input model. Default value ''.

       --k (-k) [int]
              Number of furthest neighbors to search for.  Default  value  0.   --num_projections
              (-p) [int] Number of projections to use in each hash table. Default value 5.

       --num_tables (-t) [int]
              Number of hash tables to use. Default value 5.

       --query_file (-q) [string]
              Matrix containing query points. Default value ''.

       --reference_file (-r) [string]
              Matrix containing the reference dataset.  Default value ''.

       --verbose (-v) [bool]
              Display  informational  messages  and the full list of parameters and timers at the
              end of execution.

       --version (-V) [bool]
              Display the version of mlpack.


       --distances_file (-d) [string]
              Matrix to save furthest neighbor distances to.  Default value ''.

       --neighbors_file (-n) [string]
              Matrix to save neighbor indices to. Default value ''.

       --output_model_file (-M) [unknown]
              File to save output model to. Default value ''.


       For further information, including relevant papers, citations,  and  theory,  consult  the
       documentation found at or included with your distribution of mlpack.