Ubuntu Manpage: hmm2pfam - search one or more sequences against an HMM database

NAME

       hmm2pfam - search one or more sequences against an HMM database

SYNOPSIS

       hmm2pfam [options] hmmfile seqfile

DESCRIPTION

       hmm2pfam  reads  a  sequence file seqfile and compares each sequence in it, one at a time,
       against all the HMMs in hmmfile looking for significantly similar sequence matches.

       hmmfile will be looked for first in the current working directory,  then  in  a  directory
       named  by  the  environment  variable  HMMERDB.   This  lets  administrators  install  HMM
       library(s) such as Pfam in a common location.

       There is a separate output report for each sequence in seqfile.  This report  consists  of
       three sections: a ranked list of the best scoring HMMs, a list of the best scoring domains
       in order of their occurrence in the sequence, and alignments  for  all  the  best  scoring
       domains.   A  sequence  score  may  be higher than a domain score for the same sequence if
       there is more than one domain in the sequence; the sequence score takes into  account  all
       the  domains.   All  sequences  scoring above the -E and -T cutoffs are shown in the first
       list, then every domain found in this list is shown in the second list of domain hits.  If
       desired, E-value and bit score thresholds may also be applied to the domain list using the
       --domE and --domT options.

OPTIONS

       -h     Print brief help; includes version number and summary  of  all  options,  including
              expert options.

       -n     Specify  that  models  and  sequence  are  nucleic  acid, not protein.  Other HMMER
              programs autodetect this; but because of the order in which hmm2pfam accesses data,
              it can't reliably determine the correct "alphabet" by itself.

       -A <n> Limits  the  alignment  output  to the <n> best scoring domains.  -A0 shuts off the
              alignment output and can be used to reduce the size of output files.

       -E <x> Set the E-value cutoff for the per-sequence ranked hit list to <x>, where <x> is  a
              positive  real  number.  The  default is 10.0. Hits with E-values better than (less
              than) this threshold will be shown.

       -T <x> Set the bit score cutoff for the per-sequence ranked hit list to <x>, where <x>  is
              a  real  number.   The  default  is negative infinity; by default, the threshold is
              controlled by E-value and not by bit score.   Hits  with  bit  scores  better  than
              (greater than) this threshold will be shown.

       -Z <n> Calculate  the  E-value  scores  as  if  we  had  seen  a  sequence database of <n>
              sequences. The default is arbitrarily set to 59021, the size of Swissprot 34.

EXPERT OPTIONS

--acc Report HMM accessions instead of names in the output reports. Useful for high-
throughput annotation, where the data are being parsed for storage in a relational
database.

--compat
Use the output format of HMMER 2.1.1, the 1998-2001 public release; provided so
2.1.1 parsers don't have to be rewritten.

--cpu <n>
Sets the maximum number of CPUs that the program will run on. The default is to use
all CPUs in the machine. Overrides the HMMER_NCPU environment variable. Only
affects threaded versions of HMMER (the default on most systems).

--cut_ga
Use Pfam GA (gathering threshold) score cutoffs. Equivalent to --globT <GA1>
--domT <GA2>, but the GA1 and GA2 cutoffs are read from each HMM in hmmfile
individually. hmm2build puts these cutoffs there if the alignment file was
annotated in a Pfam-friendly alignment format (extended SELEX or Stockholm format)
and the optional GA annotation line was present. If these cutoffs are not set in
the HMM file, --cut_ga doesn't work.

--cut_tc
Use Pfam TC (trusted cutoff) score cutoffs. Equivalent to --globT <TC1> --domT
<TC2>, but the TC1 and TC2 cutoffs are read from each HMM in hmmfile individually.
hmm2build puts these cutoffs there if the alignment file was annotated in a Pfam-
friendly alignment format (extended SELEX or Stockholm format) and the optional TC
annotation line was present. If these cutoffs are not set in the HMM file, --cut_tc
doesn't work.

--cut_nc
Use Pfam NC (noise cutoff) score cutoffs. Equivalent to --globT <NC1> --domT <NC2>,
but the NC1 and NC2 cutoffs are read from each HMM in hmmfile individually.
hmm2build puts these cutoffs there if the alignment file was annotated in a Pfam-
friendly alignment format (extended SELEX or Stockholm format) and the optional NC
annotation line was present. If these cutoffs are not set in the HMM file, --cut_nc
doesn't work.

--domE <x>
Set the E-value cutoff for the per-domain ranked hit list to <x>, where <x> is a
positive real number. The default is infinity; by default, all domains in the
sequences that passed the first threshold will be reported in the second list, so
that the number of domains reported in the per-sequence list is consistent with the
number that appear in the per-domain list.

--domT <x>
Set the bit score cutoff for the per-domain ranked hit list to <x>, where <x> is a
real number. The default is negative infinity; by default, all domains in the
sequences that passed the first threshold will be reported in the second list, so
that the number of domains reported in the per-sequence list is consistent with the
number that appear in the per-domain list. Important note: only one domain in a
sequence is absolutely controlled by this parameter, or by --domT. The second and
subsequent domains in a sequence have a de facto bit score threshold of 0 because
of the details of how HMMER works. HMMER requires at least one pass through the
main model per sequence; to do more than one pass (more than one domain) the
multidomain alignment must have a better score than the single domain alignment,
and hence the extra domains must contribute positive score. See the Users' Guide
for more detail.

--forward
Use the Forward algorithm instead of the Viterbi algorithm to determine the per-
sequence scores. Per-domain scores are still determined by the Viterbi algorithm.
Some have argued that Forward is a more sensitive algorithm for detecting remote
sequence homologues; my experiments with HMMER have not confirmed this, however.

--informat <s>
Assert that the input seqfile is in format <s>; do not run Babelfish format
autodection. This increases the reliability of the program somewhat, because the
Babelfish can make mistakes; particularly recommended for unattended, high-
throughput runs of HMMER. Valid format strings include FASTA, GENBANK, EMBL, GCG,
PIR, STOCKHOLM, SELEX, MSF, CLUSTAL, and PHYLIP. See the User's Guide for a
complete list.

--null2
Turn off the post hoc second null model. By default, each alignment is rescored by
a postprocessing step that takes into account possible biased composition in either
the HMM or the target sequence. This is almost essential in database searches,
especially with local alignment models. There is a very small chance that this
postprocessing might remove real matches, and in these cases --null2 may improve
sensitivity at the expense of reducing specificity by letting biased composition
hits through.

--pvm Run on a Parallel Virtual Machine (PVM). The PVM must already be running. The
client program hmm2pfam-pvm must be installed on all the PVM nodes. The HMM
database hmmfile and an associated GSI index file hmmfile.gsi must also be
installed on all the PVM nodes. (The GSI index is produced by the program
hmm2index.) Because the PVM implementation is I/O bound, it is highly recommended
that each node have a local copy of hmmfile rather than NFS mounting a shared copy.
Optional PVM support must have been compiled into HMMER for --pvm to function.

--xnu Turn on XNU filtering of target protein sequences. Has no effect on nucleic acid
sequences. In trial experiments, --xnu appears to perform less well than the
default post hoc null2 model.

COPYRIGHT

       Copyright (C) 1992-2003 HHMI/Washington University School of Medicine.
       Freely distributed under the GNU General Public License (GPL).
       See the file COPYING in your distribution for details on redistribution conditions.

AUTHOR

       Sean Eddy
       HHMI/Dept. of Genetics
       Washington Univ. School of Medicine
       4566 Scott Ave.
       St Louis, MO 63110 USA
       http://www.genetics.wustl.edu/eddy/