Ubuntu Manpage: hmm2search - search a sequence database with a profile HMM

NAME

       hmm2search - search a sequence database with a profile HMM

SYNOPSIS

       hmm2search [options] hmmfile seqfile

DESCRIPTION

       hmm2search reads an HMM from hmmfile and searches seqfile for significantly similar sequence matches.

       seqfile  will  be  looked  for  first  in the current working directory, then in a directory named by the
       environment variable BLASTDB.  This lets users use existing BLAST databases, if BLAST has been configured
       for the site.

       hmm2search may take minutes or even hours to run, depending on the size of the sequence database. It is a
       good idea to redirect the output to a file.

       The output consists of four sections: a ranked list of the best scoring sequences, a ranked list  of  the
       best  scoring  domains,  alignments  for  all the best scoring domains, and a histogram of the scores.  A
       sequence score may be higher than a domain score for the same sequence if there is more than  one  domain
       in  the sequence; the sequence score takes into account all the domains.  All sequences scoring above the
       -E and -T cutoffs are shown in the first list, then every domain found in  this  list  is  shown  in  the
       second  list  of  domain  hits.   If desired, E-value and bit score thresholds may also be applied to the
       domain list using the --domE and --domT options.

OPTIONS

       -h     Print brief help; includes version number and summary of all options, including expert options.

       -A <n> Limits the alignment output to the <n> best scoring domains.  -A0 shuts off the  alignment  output
              and can be used to reduce the size of output files.

       -E <x> Set  the  E-value cutoff for the per-sequence ranked hit list to <x>, where <x> is a positive real
              number. The default is 10.0. Hits with E-values better than (less than)  this  threshold  will  be
              shown.

       -T <x> Set  the bit score cutoff for the per-sequence ranked hit list to <x>, where <x> is a real number.
              The default is negative infinity; by default, the threshold is controlled by E-value  and  not  by
              bit score.  Hits with bit scores better than (greater than) this threshold will be shown.

       -Z <n> Calculate  the  E-value scores as if we had seen a sequence database of <n> sequences. The default
              is the number of sequences seen in your database file <seqfile>.

EXPERT OPTIONS

--compat
Use the output format of HMMER 2.1.1, the 1998-2001 public release; provided so 2.1.1 parsers
don't have to be rewritten.

--cpu <n>
Sets the maximum number of CPUs that the program will run on. The default is to use all CPUs in
the machine. Overrides the HMMER_NCPU environment variable. Only affects threaded versions of
HMMER (the default on most systems).

--cut_ga
Use Pfam GA (gathering threshold) score cutoffs. Equivalent to --globT <GA1> --domT <GA2>, but
the GA1 and GA2 cutoffs are read from the HMM file. hmm2build puts these cutoffs there if the
alignment file was annotated in a Pfam-friendly alignment format (extended SELEX or Stockholm
format) and the optional GA annotation line was present. If these cutoffs are not set in the HMM
file, --cut_ga doesn't work.

--cut_tc
Use Pfam TC (trusted cutoff) score cutoffs. Equivalent to --globT <TC1> --domT <TC2>, but the TC1
and TC2 cutoffs are read from the HMM file. hmm2build puts these cutoffs there if the alignment
file was annotated in a Pfam-friendly alignment format (extended SELEX or Stockholm format) and
the optional TC annotation line was present. If these cutoffs are not set in the HMM file,
--cut_tc doesn't work.

--cut_nc
Use Pfam NC (noise cutoff) score cutoffs. Equivalent to --globT <NC1> --domT <NC2>, but the NC1
and NC2 cutoffs are read from the HMM file. hmm2build puts these cutoffs there if the alignment
file was annotated in a Pfam-friendly alignment format (extended SELEX or Stockholm format) and
the optional NC annotation line was present. If these cutoffs are not set in the HMM file,
--cut_nc doesn't work.

--domE <x>
Set the E-value cutoff for the per-domain ranked hit list to <x>, where <x> is a positive real
number. The default is infinity; by default, all domains in the sequences that passed the first
threshold will be reported in the second list, so that the number of domains reported in the per-
sequence list is consistent with the number that appear in the per-domain list.

--domT <x>
Set the bit score cutoff for the per-domain ranked hit list to <x>, where <x> is a real number.
The default is negative infinity; by default, all domains in the sequences that passed the first
threshold will be reported in the second list, so that the number of domains reported in the per-
sequence list is consistent with the number that appear in the per-domain list. Important note:
only one domain in a sequence is absolutely controlled by this parameter, or by --domT. The
second and subsequent domains in a sequence have a de facto bit score threshold of 0 because of
the details of how HMMER works. HMMER requires at least one pass through the main model per
sequence; to do more than one pass (more than one domain) the multidomain alignment must have a
better score than the single domain alignment, and hence the extra domains must contribute
positive score. See the Users' Guide for more detail.

--forward
Use the Forward algorithm instead of the Viterbi algorithm to determine the per-sequence scores.
Per-domain scores are still determined by the Viterbi algorithm. Some have argued that Forward is
a more sensitive algorithm for detecting remote sequence homologues; my experiments with HMMER
have not confirmed this, however.

--informat <s>
Assert that the input seqfile is in format <s>; do not run Babelfish format autodection. This
increases the reliability of the program somewhat, because the Babelfish can make mistakes;
particularly recommended for unattended, high-throughput runs of HMMER. Valid format strings
include FASTA, GENBANK, EMBL, GCG, PIR, STOCKHOLM, SELEX, MSF, CLUSTAL, and PHYLIP. See the User's
Guide for a complete list.

--null2
Turn off the post hoc second null model. By default, each alignment is rescored by a
postprocessing step that takes into account possible biased composition in either the HMM or the
target sequence. This is almost essential in database searches, especially with local alignment
models. There is a very small chance that this postprocessing might remove real matches, and in
these cases --null2 may improve sensitivity at the expense of reducing specificity by letting
biased composition hits through.

--pvm Run on a Parallel Virtual Machine (PVM). The PVM must already be running. The client program
hmm2search-pvm must be installed on all the PVM nodes. Optional PVM support must have been
compiled into HMMER.

--xnu Turn on XNU filtering of target protein sequences. Has no effect on nucleic acid sequences. In
trial experiments, --xnu appears to perform less well than the default post hoc null2 model.

COPYRIGHT

       Copyright (C) 1992-2003 HHMI/Washington University School of Medicine.
       Freely distributed under the GNU General Public License (GPL).
       See the file COPYING in your distribution for details on redistribution conditions.

AUTHOR

       Sean Eddy
       HHMI/Dept. of Genetics
       Washington Univ. School of Medicine
       4566 Scott Ave.
       St Louis, MO 63110 USA
       http://www.genetics.wustl.edu/eddy/

HMMER 2.3.2                                         Oct 2003                                       hmm2search(1)