focal (1) hmm2pfam.1.gz

Provided by: hmmer2_2.3.2+dfsg-6_amd64 bug

NAME

       hmm2pfam - search one or more sequences against an HMM database

SYNOPSIS

       hmm2pfam [options] hmmfile seqfile

DESCRIPTION

       hmm2pfam  reads  a  sequence file seqfile and compares each sequence in it, one at a time,
       against all the HMMs in hmmfile looking for significantly similar sequence matches.

       hmmfile will be looked for first in the current working directory,  then  in  a  directory
       named  by  the  environment  variable  HMMERDB.   This  lets  administrators  install  HMM
       library(s) such as Pfam in a common location.

       There is a separate output report for each sequence in seqfile.  This report  consists  of
       three sections: a ranked list of the best scoring HMMs, a list of the best scoring domains
       in order of their occurrence in the sequence, and alignments  for  all  the  best  scoring
       domains.   A  sequence  score  may  be higher than a domain score for the same sequence if
       there is more than one domain in the sequence; the sequence score takes into  account  all
       the  domains.   All  sequences  scoring above the -E and -T cutoffs are shown in the first
       list, then every domain found in this list is shown in the second list of domain hits.  If
       desired, E-value and bit score thresholds may also be applied to the domain list using the
       --domE and --domT options.

OPTIONS

       -h     Print brief help; includes version number and summary  of  all  options,  including
              expert options.

       -n     Specify  that  models  and  sequence  are  nucleic  acid, not protein.  Other HMMER
              programs autodetect this; but because of the order in which hmm2pfam accesses data,
              it can't reliably determine the correct "alphabet" by itself.

       -A <n> Limits  the  alignment  output  to the <n> best scoring domains.  -A0 shuts off the
              alignment output and can be used to reduce the size of output files.

       -E <x> Set the E-value cutoff for the per-sequence ranked hit list to <x>, where <x> is  a
              positive  real  number.  The  default is 10.0. Hits with E-values better than (less
              than) this threshold will be shown.

       -T <x> Set the bit score cutoff for the per-sequence ranked hit list to <x>, where <x>  is
              a  real  number.   The  default  is negative infinity; by default, the threshold is
              controlled by E-value and not by bit score.   Hits  with  bit  scores  better  than
              (greater than) this threshold will be shown.

       -Z <n> Calculate  the  E-value  scores  as  if  we  had  seen  a  sequence database of <n>
              sequences. The default is arbitrarily set to 59021, the size of Swissprot 34.

EXPERT OPTIONS

       --acc  Report HMM accessions instead of names in the output  reports.   Useful  for  high-
              throughput  annotation, where the data are being parsed for storage in a relational
              database.

       --compat
              Use the output format of HMMER 2.1.1, the 1998-2001  public  release;  provided  so
              2.1.1 parsers don't have to be rewritten.

       --cpu <n>
              Sets the maximum number of CPUs that the program will run on. The default is to use
              all CPUs in the  machine.  Overrides  the  HMMER_NCPU  environment  variable.  Only
              affects threaded versions of HMMER (the default on most systems).

       --cut_ga
              Use  Pfam  GA  (gathering  threshold)  score  cutoffs.  Equivalent to --globT <GA1>
              --domT <GA2>, but the GA1 and GA2  cutoffs  are  read  from  each  HMM  in  hmmfile
              individually.  hmm2build  puts  these  cutoffs  there  if  the  alignment  file was
              annotated in a Pfam-friendly alignment format (extended SELEX or Stockholm  format)
              and  the  optional  GA annotation line was present. If these cutoffs are not set in
              the HMM file, --cut_ga doesn't work.

       --cut_tc
              Use Pfam TC (trusted cutoff) score cutoffs.  Equivalent  to  --globT  <TC1>  --domT
              <TC2>,  but the TC1 and TC2 cutoffs are read from each HMM in hmmfile individually.
              hmm2build puts these cutoffs there if the alignment file was annotated in  a  Pfam-
              friendly  alignment format (extended SELEX or Stockholm format) and the optional TC
              annotation line was present. If these cutoffs are not set in the HMM file, --cut_tc
              doesn't work.

       --cut_nc
              Use Pfam NC (noise cutoff) score cutoffs. Equivalent to --globT <NC1> --domT <NC2>,
              but the NC1 and NC2 cutoffs  are  read  from  each  HMM  in  hmmfile  individually.
              hmm2build  puts  these cutoffs there if the alignment file was annotated in a Pfam-
              friendly alignment format (extended SELEX or Stockholm format) and the optional  NC
              annotation line was present. If these cutoffs are not set in the HMM file, --cut_nc
              doesn't work.

       --domE <x>
              Set the E-value cutoff for the per-domain ranked hit list to <x>, where  <x>  is  a
              positive  real  number.   The  default  is infinity; by default, all domains in the
              sequences that passed the first threshold will be reported in the second  list,  so
              that the number of domains reported in the per-sequence list is consistent with the
              number that appear in the per-domain list.

       --domT <x>
              Set the bit score cutoff for the per-domain ranked hit list to <x>, where <x> is  a
              real  number.  The  default  is  negative  infinity; by default, all domains in the
              sequences that passed the first threshold will be reported in the second  list,  so
              that the number of domains reported in the per-sequence list is consistent with the
              number that appear in the per-domain list.  Important note: only one  domain  in  a
              sequence  is absolutely controlled by this parameter, or by --domT.  The second and
              subsequent domains in a sequence have a de facto bit score threshold of  0  because
              of  the  details  of  how HMMER works. HMMER requires at least one pass through the
              main model per sequence; to do more than  one  pass  (more  than  one  domain)  the
              multidomain  alignment  must  have a better score than the single domain alignment,
              and hence the extra domains must contribute positive score. See  the  Users'  Guide
              for more detail.

       --forward
              Use  the  Forward  algorithm instead of the Viterbi algorithm to determine the per-
              sequence scores. Per-domain scores are still determined by the  Viterbi  algorithm.
              Some  have  argued  that Forward is a more sensitive algorithm for detecting remote
              sequence homologues; my experiments with HMMER have not confirmed this, however.

       --informat <s>
              Assert that the input seqfile is  in  format  <s>;  do  not  run  Babelfish  format
              autodection.  This  increases  the reliability of the program somewhat, because the
              Babelfish  can  make  mistakes;  particularly  recommended  for  unattended,  high-
              throughput  runs  of HMMER. Valid format strings include FASTA, GENBANK, EMBL, GCG,
              PIR, STOCKHOLM, SELEX, MSF, CLUSTAL,  and  PHYLIP.  See  the  User's  Guide  for  a
              complete list.

       --null2
              Turn  off the post hoc second null model. By default, each alignment is rescored by
              a postprocessing step that takes into account possible biased composition in either
              the  HMM  or  the  target sequence.  This is almost essential in database searches,
              especially with local alignment models. There is a  very  small  chance  that  this
              postprocessing  might  remove  real matches, and in these cases --null2 may improve
              sensitivity at the expense of reducing specificity by  letting  biased  composition
              hits through.

       --pvm  Run  on  a  Parallel  Virtual  Machine  (PVM). The PVM must already be running. The
              client program hmm2pfam-pvm must be installed  on  all  the  PVM  nodes.   The  HMM
              database  hmmfile  and  an  associated  GSI  index  file  hmmfile.gsi  must also be
              installed on all the PVM  nodes.   (The  GSI  index  is  produced  by  the  program
              hmm2index.)   Because the PVM implementation is I/O bound, it is highly recommended
              that each node have a local copy of hmmfile rather than NFS mounting a shared copy.
              Optional PVM support must have been compiled into HMMER for --pvm to function.

       --xnu  Turn  on  XNU  filtering of target protein sequences. Has no effect on nucleic acid
              sequences. In trial experiments, --xnu  appears  to  perform  less  well  than  the
              default post hoc null2 model.

SEE ALSO

       Master man page, with full list of and guide to the individual man pages: see hmmer2(1).

       For         complete         documentation,        see        the        user        guide
       (ftp://selab.janelia.org/pub/software/hmmer/2.3.2/Userguide.pdf); or  see  the  HMMER  web
       page, http://hmmer.janelia.org/.

       Copyright (C) 1992-2003 HHMI/Washington University School of Medicine.
       Freely distributed under the GNU General Public License (GPL).
       See the file COPYING in your distribution for details on redistribution conditions.

AUTHOR

       Sean Eddy
       HHMI/Dept. of Genetics
       Washington Univ. School of Medicine
       4566 Scott Ave.
       St Louis, MO 63110 USA
       http://www.genetics.wustl.edu/eddy/