lunar (1) hmm2build.1.gz

Provided by: hmmer2_2.3.2+dfsg-8_amd64 bug

NAME

       hmm2build - build a profile HMM from an alignment

SYNOPSIS

       hmm2build [options] hmmfile alignfile

DESCRIPTION

       hmm2build  reads  a multiple sequence alignment file alignfile , builds a new profile HMM,
       and saves the HMM in hmmfile.

       alignfile may be in ClustalW, GCG  MSF,  SELEX,  Stockholm,  or  aligned  FASTA  alignment
       format. The format is automatically detected.

       By  default,  the model is configured to find one or more nonoverlapping alignments to the
       complete model: multiple global alignments with respect  to  the  model,  and  local  with
       respect  to the sequence.  This is analogous to the behavior of the hmmls program of HMMER
       1.  To configure the model for multiple local alignments with respect  to  the  model  and
       local  with  respect  to  the  sequence, a la the old program hmmfs, use the -f (fragment)
       option. More rarely, you may want to configure the model for  a  single  global  alignment
       (global with respect to both model and sequence), using the -g option; or to configure the
       model for a single local/local alignment (a la standard Smith/Waterman, or the  old  hmmsw
       program), use the -s option.

OPTIONS

       -f     Configure  the  model  for finding multiple domains per sequence, where each domain
              can be a local (fragmentary) alignment. This is analogous to the old hmmfs  program
              of HMMER 1.

       -g     Configure  the  model  for  finding a single global alignment to a target sequence,
              analogous to the old hmms program of HMMER 1.

       -h     Print brief help; includes version number and summary  of  all  options,  including
              expert options.

       -n <s> Name  this  HMM  <s>.  <s> can be any string of non-whitespace characters (e.g. one
              "word").  There is no length limit (at least not one imposed by HMMER;  your  shell
              will complain about command line lengths first).

       -o <f> Re-save the starting alignment to <f>, in Stockholm format.  The columns which were
              assigned to match states will be marked with x's in an #=RF  annotation  line.   If
              either  the  --hand  or  --fast construction options were chosen, the alignment may
              have been slightly altered to be compatible with Plan 7 transitions, so saving  the
              final  alignment  and  comparing  to  the starting alignment can let you view these
              alterations.  See the User's Guide for more information on this arcane side effect.

       -s     Configure the model for finding a single local alignment per target sequence.  This
              is analogous to the standard Smith/Waterman algorithm or the hmmsw program of HMMER
              1.

       -A     Append this model to an existing hmmfile rather than creating hmmfile.  Useful  for
              building HMM libraries (like Pfam).

       -F     Force  overwriting  of an existing hmmfile.  Otherwise HMMER will refuse to clobber
              your existing HMM files, for safety's sake.

EXPERT OPTIONS

       --amino
              Force the sequence alignment to be interpreted as amino  acid  sequences.  Normally
              HMMER autodetects whether the alignment is protein or DNA, but sometimes alignments
              are so small that autodetection is ambiguous. See --nucleic.

       --archpri <x>
              Set the "architecture prior" used by MAP architecture construction  to  <x>,  where
              <x>  is  a  probability  between  0 and 1. This parameter governs a geometric prior
              distribution over model lengths. As <x> increases,  longer  models  are  favored  a
              priori.  As <x> decreases, it takes more residue conservation in a column to make a
              column a "consensus" match column in the model architecture.  The 0.85 default  has
              been chosen empirically as a reasonable setting.

       --binary
              Write the HMM to hmmfile in HMMER binary format instead of readable ASCII text.

       --cfile <f>
              Save  the observed emission and transition counts to <f> after the architecture has
              been determined (e.g. after residues/gaps have been assigned to match, delete,  and
              insert states).  This option is used in HMMER development for generating data files
              useful for training new Dirichlet priors. The format of count files  is  documented
              in the User's Guide.

       --fast Quickly  and heuristically determine the architecture of the model by assigning all
              columns will more than a certain fraction of gap characters to  insert  states.  By
              default this fraction is 0.5, and it can be changed using the --gapmax option.  The
              default construction algorithm is a maximum a posteriori (MAP) algorithm, which  is
              slower.

       --gapmax <x>
              Controls  the --fast model construction algorithm, but if --fast is not being used,
              has no effect.  If a column has more than a fraction <x> of gap symbols in  it,  it
              gets  assigned to an insert column.  <x> is a frequency from 0 to 1, and by default
              is set to 0.5. Higher values of <x> mean more columns get  assigned  to  consensus,
              and  models  get  longer;  smaller values of <x> mean fewer columns get assigned to
              consensus, and models get smaller.  <x>

       --hand Specify the architecture of the model by hand: the alignment file must be in  SELEX
              or  Stockholm  format, and the reference annotation line (#=RF in SELEX, #=GC RF in
              Stockholm) is used to specify the architecture. Any column marked  with  a  non-gap
              symbol  (such as an 'x', for instance) is assigned as a consensus (match) column in
              the model.

       --idlevel <x>
              Controls both the determination of effective sequence number and  the  behavior  of
              the  --wblosum  weighting  option.  The  sequence alignment is clustered by percent
              identity, and the number of clusters at a  cutoff  threshold  of  <x>  is  used  to
              determine  the  effective sequence number.  Higher values of <x> give more clusters
              and higher effective sequence numbers; lower values of <x> give fewer clusters  and
              lower effective sequence numbers.  <x> is a fraction from 0 to 1, and by default is
              set to 0.62 (corresponding  to  the  clustering  level  used  in  constructing  the
              BLOSUM62 substitution matrix).

       --informat <s>
              Assert  that  the  input  seqfile  is  in  format  <s>; do not run Babelfish format
              autodection. This increases the reliability of the program  somewhat,  because  the
              Babelfish  can  make  mistakes;  particularly  recommended  for  unattended,  high-
              throughput runs of HMMER. Valid format strings include FASTA, GENBANK,  EMBL,  GCG,
              PIR,  STOCKHOLM,  SELEX,  MSF,  CLUSTAL,  and  PHYLIP.  See  the User's Guide for a
              complete list.

       --noeff
              Turn off the effective sequence number calculation, and  use  the  true  number  of
              sequences  instead. This will usually reduce the sensitivity of the final model (so
              don't do it without good reason!)

       --nucleic
              Force the alignment to be interpreted as nucleic acid sequence, either RNA or  DNA.
              Normally  HMMER  autodetects whether the alignment is protein or DNA, but sometimes
              alignments are so small that autodetection is ambiguous. See --amino.

       --null <f>
              Read a null model from <f>.  The default for protein is to use average  amino  acid
              frequencies from Swissprot 34 and p1 = 350/351; for nucleic acid, the default is to
              use 0.25 for each base and p1 = 1000/1001. For documentation of the format  of  the
              null  model  file  and  further  explanation of how the null model is used, see the
              User's Guide.

       --pam <f>
              Apply a heuristic  PAM-  (substitution  matrix-)  based  prior  on  match  emission
              probabilities  instead of the default mixture Dirichlet. The substitution matrix is
              read from <f>.  See --pamwgt.

              The default  Dirichlet  state  transition  prior  and  insert  emission  prior  are
              unaffected.  Therefore  in  principle you could combine --prior with --pam but this
              isn't recommended, as it hasn't been tested. (  --pam  itself  hasn't  been  tested
              much!)

       --pamwgt <x>
              Controls  the  weight on a PAM-based prior. Only has effect if --pam option is also
              in use.  <x> is a positive real number, 20.0 by default.   <x>  is  the  number  of
              "pseudocounts"  contriubuted  by  the  heuristic prior. Very high values of <x> can
              force a scoring system that is entirely driven by the substitution  matrix,  making
              HMMER somewhat approximate Gribskov profiles.

       --pbswitch <n>
              For  alignments with a very large number of sequences, the GSC, BLOSUM, and Voronoi
              weighting schemes are slow; they're O(N^2) for N sequences. Henikoff position-based
              weights  (PB  weights) are more efficient. At or above a certain threshold sequence
              number <n> hmm2build will switch  from  GSC,  BLOSUM,  or  Voronoi  weights  to  PB
              weights.  To  disable this switching behavior (at the cost of compute time, set <n>
              to be something larger than the number of sequences in your alignment.   <n>  is  a
              positive integer; the default is 1000.

       --prior <f>
              Read  a  Dirichlet  prior  from  <f>, replacing the default mixture Dirichlet.  The
              format of prior files is documented in the User's Guide, and an example is given in
              the Demos directory of the HMMER distribution.

       --swentry <x>
              Controls the total probability that is distributed to local entries into the model,
              versus starting at the beginning of the model as in a global alignment.  <x>  is  a
              probability  from  0 to 1, and by default is set to 0.5.  Higher values of <x> mean
              that hits that are fragments  on  their  left  (N  or  5'-terminal)  side  will  be
              penalized  less,  but  complete  global  alignments  will be penalized more.  Lower
              values of <x> mean that fragments on the left will be penalized  more,  and  global
              alignments   on   this  side  will  be  favored.   This  option  only  affects  the
              configurations that allow local alignments, e.g.  -s and -f; unless  one  of  these
              options is also activated, this option has no effect.  You have independent control
              over local/global alignment behavior for the N/C (5'/3')  termini  of  your  target
              sequences using --swentry and --swexit.

       --swexit <x>
              Controls  the  total probability that is distributed to local exits from the model,
              versus ending an alignment at the end of the model as in a global  alignment.   <x>
              is  a  probability from 0 to 1, and by default is set to 0.5.  Higher values of <x>
              mean that hits that are fragments on their right (C or 3'-terminal)  side  will  be
              penalized  less,  but  complete  global  alignments  will be penalized more.  Lower
              values of <x> mean that fragments on the right will be penalized more,  and  global
              alignments   on   this  side  will  be  favored.   This  option  only  affects  the
              configurations that allow local alignments, e.g.  -s and -f; unless  one  of  these
              options is also activated, this option has no effect.  You have independent control
              over local/global alignment behavior for the N/C (5'/3')  termini  of  your  target
              sequences using --swentry and --swexit.

       --verbose
              Print  more  possibly useful stuff, such as the individual scores for each sequence
              in the alignment.

       --wblosum
              Use the BLOSUM filtering algorithm to weight the sequences, instead of the default.
              Cluster  the  sequences at a given percentage identity (see --idlevel); assign each
              cluster a total weight of 1.0, distributed equally  amongst  the  members  of  that
              cluster.

       --wgsc Use  the  Gerstein/Sonnhammer/Chothia  ad hoc sequence weighting algorithm. This is
              already the default, so this option has no effect (unless it follows another option
              in the -\-w family, in which case it overrides it).

       --wme  Use  the  Krogh/Mitchison maximum entropy algorithm to "weight" the sequences. This
              supersedes the Eddy/Mitchison/Durbin maximum discrimination algorithm, which  gives
              almost  identical weights but is less robust. ME weighting seems to give a marginal
              increase in sensitivity over the default GSC weights, but takes a  fair  amount  of
              time.

       --wnone
              Turn off all sequence weighting.

       --wpb  Use the Henikoff position-based weighting scheme.

       --wvoronoi
              Use  the Sibbald/Argos Voronoi sequence weighting algorithm in place of the default
              GSC weighting.

SEE ALSO

       Master man page, with full list of and guide to the individual man pages: see hmmer2(1).

       For        complete        documentation,        see        the         user         guide
       (ftp://selab.janelia.org/pub/software/hmmer/2.3.2/Userguide.pdf);  or  see  the  HMMER web
       page, http://hmmer.janelia.org/.

       Copyright (C) 1992-2003 HHMI/Washington University School of Medicine.
       Freely distributed under the GNU General Public License (GPL).
       See the file COPYING in your distribution for details on redistribution conditions.

AUTHOR

       Sean Eddy
       HHMI/Dept. of Genetics
       Washington Univ. School of Medicine
       4566 Scott Ave.
       St Louis, MO 63110 USA
       http://www.genetics.wustl.edu/eddy/