Provided by: hmmer_3.1b1-3_amd64 bug

NAME

       alimask - Add mask line to a multiple sequence alignment

SYNOPSIS

       alimask [options] <msafile> <postmsafile>

DESCRIPTION

       alimask  is  used to apply a mask line to a multiple sequence alignment, based on provided
       alignment or model coordinates.  When hmmbuild receives a masked alignment  as  input,  it
       produces  a  profile model in which the emission probabilities at masked positions are set
       to match the background frequency, rather than being set based on observed frequencies  in
       the  alignment.   Position-specific  insertion and deletion rates are not altered, even in
       masked regions.  alimask autodetects input  format,  and  produces  masked  alignments  in
       Stockholm format.  <msafile> may contain only one sequence alignment.

       A  common  motivation  for  masking a region in an alignment is that the region contains a
       simple tandem repeat that is observed to cause an unacceptably high rate of false positive
       hits.

       In  the  simplest  case,  a  mask  range  is  given  in  coordinates relative to the input
       alignment, using --alirange <s>.  However it is more often the case that the region to  be
       masked  has  been  identified  in coordinates relative to the profile model (e.g. based on
       recognizing a simple repeat pattern in false hit alignments or in the HMM logo).  Not  all
       alignment columns are converted to match state positions in the profile (see the --symfrac
       flag for hmmbuild for discussion), so model positions  do  not  necessarily  match  up  to
       alignment  column  positions.   To  remove  the  burden  of  converting model positions to
       alignment positions, alimask accepts the mask range input in model  coordinates  as  well,
       using  --modelrange  <s>.   When  using  this  flag,  alimask  determines  which alignment
       positions would be identified by hmmbuild as match states, a process  that  requires  that
       all  hmmbuild flags impacting that decision be supplied to alimask.  It is for this reason
       that many of the hmmbuild flags are also used by alimask.

OPTIONS

       -h     Help; print a brief reminder of command line usage and all available options.

       -o <f> Direct the summary output to file <f>, rather than to stdout.

OPTIONS FOR SPECIFYING MASK RANGE

       A single mask range is given  as  a  dash-separated  pair,  like  --modelrange  10-20  and
       multiple ranges may be submitted as a comma-separated list, --modelrange 10-20,30-42.

       --modelrange <s>
              Supply the given range(s) in model coordinates.

       --alirange <s>
              Supply the given range(s) in alignment coordinates.

       --apendmask
              Add to the existing mask found with the alignment.  The default is to overwrite any
              existing mask.

       --model2ali <s>
              Rather than actually produce the masked  alignment,  simply  print  model  range(s)
              corresponding to input alignment range(s).

       --ali2model <s>
              Rather  than actually produce the masked alignment, simply print alignment range(s)
              corresponding to input model range(s).

OPTIONS FOR SPECIFYING THE ALPHABET

       The alphabet type (amino, DNA, or RNA) is autodetected  by  default,  by  looking  at  the
       composition  of  the  msafile.  Autodetection is normally quite reliable, but occasionally
       alphabet type may be ambiguous and autodetection can  fail  (for  instance,  on  tiny  toy
       alignments  of just a few residues). To avoid this, or to increase robustness in automated
       analysis pipelines, you may specify the alphabet type of msafile with these options.

       --amino
              Specify that all sequences in msafile are proteins.

       --dna  Specify that all sequences in msafile are DNAs.

       --rna  Specify that all sequences in msafile are RNAs.

OPTIONS CONTROLLING PROFILE CONSTRUCTION

       These options control how consensus columns are defined in an alignment.

       --fast Define consensus columns as those that have a fraction >= symfrac  of  residues  as
              opposed to gaps. (See below for the --symfrac option.) This is the default.

       --hand Define consensus columns in next profile using reference annotation to the multiple
              alignment.  This allows you to define any consensus columns you like.

       --symfrac <x>
              Define the residue fraction threshold necessary to define a consensus  column  when
              using  the --fast option. The default is 0.5. The symbol fraction in each column is
              calculated after taking relative sequence weighting into account, and ignoring  gap
              characters  corresponding  to  ends  of  sequence fragments (as opposed to internal
              insertions/deletions).  Setting this to 0.0 means that every alignment column  will
              be  assigned  as  consensus,  which  may be useful in some cases. Setting it to 1.0
              means that only columns that include 0 gaps (internal insertions/deletions) will be
              assigned as consensus.

       --fragthresh <x>
              We  only  want to count terminal gaps as deletions if the aligned sequence is known
              to be full-length, not if it is a fragment (for instance, because only part  of  it
              was sequenced). HMMER uses a simple rule to infer fragments: if the sequence length
              L is less than or equal to a fraction <x> times the alignment  length  in  columns,
              then  the  sequence  is  handled  as  a  fragment.  The  default  is  0.5.  Setting
              --fragthresh0 will define no (nonempty) sequence as a fragment; you might  want  to
              do  this  if  you  know  you've  got  a  carefully curated alignment of full-length
              sequences.  Setting --fragthresh1 will define all sequences as fragments; you might
              want  to do this if you know your alignment is entirely composed of fragments, such
              as translated short reads in metagenomic shotgun data.

OPTIONS CONTROLLING RELATIVE WEIGHTS

       HMMER uses an ad hoc sequence weighting algorithm to downweight closely related  sequences
       and  upweight  distantly related ones. This has the effect of making models less biased by
       uneven phylogenetic representation. For example, two identical sequences  would  typically
       each  receive  half  the  weight  that  one  sequence  would.  These options control which
       algorithm gets used.

       --wpb  Use the Henikoff position-based sequence weighting scheme [Henikoff  and  Henikoff,
              J. Mol. Biol. 243:574, 1994].  This is the default.

       --wgsc Use  the  Gerstein/Sonnhammer/Chothia  weighting algorithm [Gerstein et al, J. Mol.
              Biol. 235:1067, 1994].

       --wblosum
              Use the same clustering scheme that was used to weight data in  calculating  BLOSUM
              subsitution matrices [Henikoff and Henikoff, Proc. Natl. Acad. Sci 89:10915, 1992].
              Sequences are single-linkage clustered at an identity threshold (default 0.62;  see
              --wid)  and  within each cluster of c sequences, each sequence gets relative weight
              1/c.

       --wnone
              No relative weights. All sequences are assigned uniform weight.

       --wid <x>
              Sets the identity threshold used by single-linkage clustering when using --wblosum.
              Invalid with any other weighting scheme. Default is 0.62.

OTHER OPTIONS

       --informat <s>
              Declare  that  the input msafile is in format <s>.  Currently the accepted multiple
              alignment sequence file formats include Stockholm,  Aligned  FASTA,  Clustal,  NCBI
              PSI-BLAST,  PHYLIP, Selex, and UCSC SAM A2M. Default is to autodetect the format of
              the file.

       --seed <n>
              Seed the random number generator with <n>, an integer >= 0.  If <n> is nonzero, any
              stochastic  simulations  will  be reproducible; the same command will give the same
              results.  If <n> is 0, the random  number  generator  is  seeded  arbitrarily,  and
              stochastic  simulations will vary from run to run of the same command.  The default
              seed is 42.

SEE ALSO

       See hmmer(1) for a master man page with a  list  of  all  the  individual  man  pages  for
       programs in the HMMER package.

       For  complete  documentation,  see  the  user guide that came with your HMMER distribution
       (Userguide.pdf); or see the HMMER web page ().

COPYRIGHT

       Copyright (C) 2013 Howard Hughes Medical Institute.
       Freely distributed under the GNU General Public License (GPLv3).

       For additional information on copyright and licensing, see the file  called  COPYRIGHT  in
       your HMMER source distribution, or see the HMMER web page ().

AUTHOR

       Eddy/Rivas Laboratory
       Janelia Farm Research Campus
       19700 Helix Drive
       Ashburn VA 20147 USA
       http://eddylab.org