Provided by: mlpack-bin_3.4.2-7ubuntu1_amd64
NAME
mlpack_dbscan - dbscan clustering
SYNOPSIS
mlpack_dbscan -i string [-e double] [-m int] [-N bool] [-s string] [-S bool] [-t string] [-V bool] [-a string] [-C string] [-h -v]
DESCRIPTION
This program implements the DBSCAN algorithm for clustering using accelerated tree-based range search. The type of tree that is used may be parameterized, or brute-force range search may also be used. The input dataset to be clustered may be specified with the '--input_file (-i)' parameter; the radius of each range search may be specified with the ’--epsilon (-e)' parameters, and the minimum number of points in a cluster may be specified with the '--min_size (-m)' parameter. The '--assignments_file (-a)' and '--centroids_file (-C)' output parameters may be used to save the output of the clustering. '--assignments_file (-a)' contains the cluster assignments of each point, and '--centroids_file (-C)' contains the centroids of each cluster. The range search may be controlled with the '--tree_type (-t)', '--single_mode (-S)', and '--naive (-N)' parameters. '--tree_type (-t)' can control the type of tree used for range search; this can take a variety of values: 'kd', 'r', ’r-star', 'x', 'hilbert-r', 'r- plus', 'r-plus-plus', 'cover', 'ball'. The ’--single_mode (-S)' parameter will force single-tree search (as opposed to the default dual-tree search), and ''--naive (-N)' will force brute-force range search. An example usage to run DBSCAN on the dataset in 'input.csv' with a radius of 0.5 and a minimum cluster size of 5 is given below: $ mlpack_dbscan --input_file input.csv --epsilon 0.5 --min_size 5
REQUIRED INPUT OPTIONS
--input_file (-i) [string] Input dataset to cluster.
OPTIONAL INPUT OPTIONS
--epsilon (-e) [double] Radius of each range search. Default value 1. --help (-h) [bool] Default help info. --info [string] Print help on a specific option. Default value ''. --min_size (-m) [int] Minimum number of points for a cluster. Default value 5. --naive (-N) [bool] If set, brute-force range search (not tree-based) will be used. --selection_type (-s) [string] If using point selection policy, the type of selection to use ('ordered', 'random'). Default value 'ordered'. --single_mode (-S) [bool] If set, single-tree range search (not dual-tree) will be used. --tree_type (-t) [string] If using single-tree or dual-tree search, the type of tree to use ('kd', 'r', 'r- star', 'x', 'hilbert-r', 'r-plus', 'r-plus-plus', 'cover', 'ball'). Default value 'kd'. --verbose (-v) [bool] Display informational messages and the full list of parameters and timers at the end of execution. --version (-V) [bool] Display the version of mlpack.
OPTIONAL OUTPUT OPTIONS
--assignments_file (-a) [string] Output matrix for assignments of each point. --centroids_file (-C) [string] Matrix to save output centroids to.
ADDITIONAL INFORMATION
For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of mlpack.