Provided by: mlpack-bin_3.2.2-3_amd64 bug

NAME

       mlpack_approx_kfn - approximate furthest neighbor search

SYNOPSIS

        mlpack_approx_kfn [-a string] [-e bool] [-x string] [-m unknown] [-k int] [-p int] [-t int] [-q string] [-r string] [-V bool] [-d string] [-n string] [-M unknown] [-h -v]

DESCRIPTION

       This program implements two strategies for furthest neighbor search. These strategies are:

              •  The  'qdafn'  algorithm  from "Approximate Furthest Neighbor in High Dimensions" by R. Pagh, F.
                 Silvestri, J. Sivertsen, and M. Skala, in Similarity Search and Applications 2015 (SISAP).

              •  The 'DrusillaSelect' algorithm from "Fast approximate furthest  neighbors  with  data-dependent
                 candidate  selection",  by  R.R. Curtin and A.B. Gardner, in Similarity Search and Applications
                 2016 (SISAP).

       These two strategies give approximate results for the furthest neighbor search problem and can be used as
       fast  replacements  for other furthest neighbor techniques such as those found in the mlpack_kfn program.
       Note that typically, the 'ds' algorithm requires far  fewer  tables  and  projections  than  the  'qdafn'
       algorithm.

       Specify  a  reference  set  (set  to  search  in)  with '--reference_file (-r)', specify a query set with
       '--query_file (-q)', and specify algorithm parameters with  '--num_tables  (-t)'  and  '--num_projections
       (-p)'  (or  don't  and  defaults  will be used). The algorithm to be used (either 'ds'---the default---or
       ’qdafn') may be specified with '--algorithm (-a)'. Also specify the number of  neighbors  to  search  for
       with '--k (-k)'.

       If  no query set is specified, the reference set will be used as the query set.  The '--output_model_file
       (-M)' output parameter may be used to store the built model, and an input model may be loaded instead  of
       specifying a reference set with the '--input_model_file (-m)' option.

       Results  for  each query point can be stored with the '--neighbors_file (-n)' and '--distances_file (-d)'
       output parameters. Each row of these output matrices holds the k distances or neighbor indices  for  each
       query point.

       For  example,  to find the 5 approximate furthest neighbors with ’reference_set.csv' as the reference set
       and 'query_set.csv' as the query set using DrusillaSelect,  storing  the  furthest  neighbor  indices  to
       'neighbors.csv' and the furthest neighbor distances to 'distances.csv', one could call

       $  mlpack_approx_kfn  --query_file  query_set.csv --reference_file reference_set.csv --k 5 --algorithm ds
       --neighbors_file neighbors.csv --distances_file distances.csv

       and to perform approximate all-furthest-neighbors search with k=1 on the set ’data.csv' storing only  the
       furthest neighbor distances to 'distances.csv', one could call

       $ mlpack_approx_kfn --reference_file reference_set.csv --k 1 --distances_file distances.csv

       A  trained  model can be re-used. If a model has been previously saved to ’model.bin', then we may find 3
       approximate furthest neighbors on a query set ’new_query_set.csv' using that model and store the furthest
       neighbor indices into 'neighbors.csv' by calling

       $  mlpack_approx_kfn  --input_model_file  model.bin --query_file new_query_set.csv --k 3 --neighbors_file
       neighbors.csv

OPTIONAL INPUT OPTIONS

       --algorithm (-a) [string]
              Algorithm to use: 'ds' or 'qdafn'. Default value 'ds'.

       --calculate_error (-e) [bool]
              If set, calculate the average distance error for the first furthest neighbor only.

       --exact_distances_file (-x) [string]
              Matrix containing exact distances to furthest neighbors;  this  can  be  used  to  avoid  explicit
              calculation when --calculate_error is set.

       --help (-h) [bool]
              Default help info.

       --info [string]
              Print help on a specific option. Default value ''.

       --input_model_file (-m) [unknown]
              File containing input model.

       --k (-k) [int]
              Number of furthest neighbors to search for.  Default value 0.  --num_projections (-p) [int] Number
              of projections to use in each hash table. Default value 5.

       --num_tables (-t) [int]
              Number of hash tables to use. Default value 5.

       --query_file (-q) [string]
              Matrix containing query points.

       --reference_file (-r) [string]
              Matrix containing the reference dataset.

       --verbose (-v) [bool]
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version (-V) [bool]
              Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

       --distances_file (-d) [string]
              Matrix to save furthest neighbor distances to.

       --neighbors_file (-n) [string]
              Matrix to save neighbor indices to.

       --output_model_file (-M) [unknown]
              File to save output model to.

ADDITIONAL INFORMATION

       For further information, including relevant papers, citations,  and  theory,  consult  the  documentation
       found at http://www.mlpack.org or included with your distribution of mlpack.