Ubuntu Manpage: makehmmerdb - build nhmmer database from a sequence file

NAME

       makehmmerdb - build nhmmer database from a sequence file

SYNOPSIS

       makehmmerdb [options] seqfile binaryfile

DESCRIPTION

       makehmmerdb is used to create a binary file from a DNA sequence file. This binary file may
       be used as a target database for the DNA search tool nhmmer.  Using  default  settings  in
       nhmmer,  this  yields  a  roughly  10-fold  acceleration with small loss of sensitivity on
       benchmarks.

OPTIONS

       -h     Help; print a brief reminder of command line usage and all available options.

OTHER OPTIONS

       --informat <s>
              Assert that input seqfile is in format <s>, bypassing format autodetection.  Common
              choices for <s> include: fasta, embl, genbank.  Alignment formats also work; common
              choices  include:  stockholm,  a2m,  afa,  psiblast,  clustal,  phylip.   For  more
              information,  and  for  codes for some less common formats, see main documentation.
              The string <s> is case-insensitive (fasta or FASTA both work).

       --bin_length <n>
              Bin length. The binary file depends on a data structure called the FM index,  which
              organizes a permuted copy of the sequence in bins of length <n>.  Longer bin length
              will lead to smaller files (because data is captured about each bin)  and  possibly
              slower  query  time.  The  default  is  256. Much more than 512 may lead to notable
              reduction in speed.

       --sa_freq <n>
              Suffix array sample rate. The FM index structure also samples from  the  underlying
              suffix  array  for the sequence database. More frequent sampling (smaller value for
              <n>) will yield larger file size and faster search (until file size  becomes  large
              enough to cause I/O to be a bottleneck). The default value is 8. Must be a power of
              2.

       --block_size <n>
              The input sequence is broken into blocks of size <n> million letters. An  FM  index
              is  built  for each block, rather than building an FM index for the entire sequence
              database. Default is 50. Larger blocks do  not  seem  to  yield  substantial  speed
              increase.

COPYRIGHT

       Copyright (C) 2019 Howard Hughes Medical Institute.
       Freely distributed under the BSD open source license.

       For  additional  information  on copyright and licensing, see the file called COPYRIGHT in
       your HMMER source distribution, or see the HMMER web page (http://hmmer.org/).

AUTHOR

       http://eddylab.org

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

OTHER OPTIONS

SEE ALSO

COPYRIGHT

AUTHOR