Provided by: biosquid_1.9g+cvs20050121-12_amd64 bug

NAME

       alistat - show statistics for a multiple alignment file

SYNOPSIS

       alistat [options] alignfile

DESCRIPTION

       alistat  reads  a  multiple  sequence  alignment  from the file alignfile in any supported
       format (including SELEX, GCG MSF, and CLUSTAL), and shows a number  of  simple  statistics
       about  it.   These statistics include the name of the format, the number of sequences, the
       total number of residues, the average and range of the  sequence  lengths,  the  alignment
       length (e.g. including gap characters).

       Also  shown  are some percent identities. A percent pairwise alignment identity is defined
       as (idents / MIN(len1, len2)) where idents is the number of  exact  identities  and  len1,
       len2 are the unaligned lengths of the two sequences. The "average percent identity", "most
       related pair", and "most unrelated pair" of the alignment are the  average,  maximum,  and
       minimum  of  all  (N)(N-1)/2 pairs, respectively.  The "most distant seq" is calculated by
       finding the maximum pairwise identity (best relative) for all N  sequences,  then  finding
       the minimum of these N numbers (hence, the most outlying sequence).

OPTIONS

       -a     Show  additional  verbose  information:  a table with one line per sequence showing
              name, length, and its  highest  and  lowest  pairwise  identity.  These  lines  are
              prefixed  with  a  * character to enable easily grep'ing them out and sorting them.
              For example, alistat -a foo.slx | grep * | sort -n +3 gives a ranked  list  of  the
              most distant sequences in the alignment.  Incompatible with the -f option.

       -f     Fast;  use  a  sampling  method  to  estimate the average %id.  When this option is
              chosen, alistat doesn't show the  other  three  pairwise  identity  numbers.   This
              option is useful for very large alignments, for which the full (N)(N-1) calculation
              of all pairs would be prohibitive (e.g. Pfam's GP120 alignment,  with  over  10,000
              sequences). Incompatible with the -a option.

       -h     Print  brief  help;  includes  version number and summary of all options, including
              expert options.

       -q     be quiet - suppress the verbose header (program name, release number and date,  the
              parameters and options in effect).

       -B     (Babelfish).  Autodetect  and  read  a  sequence file format other than the default
              (FASTA). Almost any common sequence file format is recognized  (including  Genbank,
              EMBL,  SWISS-PROT, PIR, and GCG unaligned sequence formats, and Stockholm, GCG MSF,
              and Clustal alignment formats). See the printed documentation for a  complete  list
              of supported formats.

EXPERT OPTIONS

       --informat <s>
              Specify  that  the  sequence  file  is in format <s>, rather than the default FASTA
              format.  Common examples include Genbank, EMBL, GCG, PIR, Stockholm, Clustal,  MSF,
              or  PHYLIP;  see  the  printed documentation for a complete list of accepted format
              names.  This option overrides the default  format  (FASTA)  and  the  -B  Babelfish
              autodetection option.

SEE ALSO

       afetch(1),  compalign(1),  compstruct(1),  revcomp(1), seqsplit(1), seqstat(1), sfetch(1),
       shuffle(1), sindex(1), sreformat(1), stranslate(1), weight(1).

AUTHOR

       Biosquid and its documentation are  Copyright  (C)  1992-2003  HHMI/Washington  University
       School  of  Medicine  Freely  distributed  under  the GNU General Public License (GPL) See
       COPYING in the source code distribution for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu