Provided by: seqan-raptor_3.0.1+ds-3build1_amd64 bug

NAME

       Raptor-layout  - A fast and space-efficient pre-filter for querying very large collections
       of nucleotide sequences.

DESCRIPTION

       Computes an HIBF layout that tries to minimize the disk space consumption of the resulting
       index.  The  space  is  estimated  using  a  k-mer count per user bin which represents the
       potential denisity in a technical bin in an interleaved Bloom filter.  You  can  pass  the
       resulting  layout  to  raptor  (https://github.com/seqan/raptor)  to  build  the index and
       conduct queries.

OPTIONS

   Main options:
       --input-file (std::filesystem::path)
              The input must be a file containing paths to sequence data you  wish  to  estimate;
              one  filepath  per  line. If your file contains auxiliary information (e.g. species
              IDs), your file must be tab-separated.

       Example file:

       ```

       /absolute/path/to/file1.fasta

       /absolute/path/to/file2.fa.gz

       ```

       --kmer-size (unsigned 8 bit integer)
              The k-mer size influences the size estimates of the input. Choosing  a  k-mer  size
              that  is  too  small for your data will result in files appearing more similar than
              they  really  are.  Likewise,  a  large  k-mer  size  might  miss  out  on  certain
              similarities.  For  DNA  sequences, a k-mer size between [16,32] has proven to work
              well. Default: 19.

       --num-hash-functions (unsigned 64 bit integer)
              The number of hash functions to use when  building  the  HIBF  from  the  resulting
              layout.  This  parameter  is  needed  to  correctly  estimate  the  index size when
              computing the layout. Default: 2.

       --false-positive-rate (double)
              The false positive rate you aim for when  building  the  HIBF  from  the  resulting
              layout.  This  parameter  is  needed  to  correctly  estimate  the  index size when
              computing the layout. Default: 0.05.

       --output-filename (std::filesystem::path)
              A file name for the resulting layout. Default: "binning.out".

       --threads (unsigned 64 bit integer)
              The number of threads to use. Currently, only merging of sketches is  parallelized,
              so  if  the  flag  --disable-rearrangement  is  set, --threads will have no effect.
              Default: 1. Value must be in range [1,18446744073709551615].

   HyperLogLog Sketches:
       To improve the layout, you  can  estimate  the  sequence  similarities  using  HyperLogLog
       sketches.

       --disable-estimate-union
              The sketches are used to estimate the sequence similarity among a set of user bins.
              This will improve the layout computation as merging user bins that do not  increase
              technical bin sizes will be preferred. This may use more RAM and can be disabled in
              RAM-critical environments. Attention: Also disables rearrangement which depends  on
              union estimations.

       --disable-rearrangement
              As  a  preprocessing  step,  rearranging  the order of the given user bins based on
              their sequence similarity may lead to favourable small unions and  thus  a  smaller
              index.  Depending  on  the  number  of input samples (user bins), this may be time-
              consuming and can thus be disabled if a suboptimal layout is sufficient.

   Parameter Tweaking:
   Special options

REFERENCES

       [1] Philippe Flajolet, Éric Fusy, Olivier Gandouet,  Frédéric  Meunier.  HyperLogLog:  the
       analysis of a near-optimal cardinality estimation algorithm. AofA: Analysis of Algorithms,
       Jun     2007,     Juan     les     Pins,     France.      pp.137-156.      hal-00406166v2,
       https://doi.org/10.46298/dmtcs.3545

   Common options
       -h, --help
              Prints the help page.

       -hh, --advanced-help
              Prints the help page including advanced options.

       --version
              Prints the version information.

       --copyright
              Prints the copyright/license information.

       --export-help (std::string)
              Export the help page information. Value must be one of [html, man, ctd, cwl].

VERSION

       Last update: Unavailable
       Raptor-layout version: 3.0.1 (commit unavailable)
       Sharg version: 1.1.1
       SeqAn version: 3.3.0-rc.2

URL

       https://github.com/seqan/raptor

LEGAL

       Raptor-layout Copyright: BSD 3-Clause License
       Author: Svenja Mehringer
       Contact: svenja.mehringer@fu-berlin.de
       SeqAn Copyright: 2006-2023 Knut Reinert, FU-Berlin; released under the 3-clause BSDL.
       In  your  academic  works  please  cite: Raptor: A fast and space-efficient pre-filter for
       querying very large collections of nucleotide sequences; Enrico Seiler, Svenja  Mehringer,
       Mitra  Darvish,  Etienne  Turc,  and  Knut  Reinert;  iScience  2021  24 (7): 102782. doi:
       https://doi.org/10.1016/j.isci.2021.102782
       For full copyright and/or warranty information see --copyright.