xenial (1) alistat.1.gz

Provided by: biosquid_1.9g+cvs20050121-5_amd64 bug

NAME

       alistat - show statistics for a multiple alignment file

SYNOPSIS

       alistat [options] alignfile

DESCRIPTION

       alistat  reads  a  multiple sequence alignment from the file alignfile in any supported format (including
       SELEX, GCG MSF, and CLUSTAL), and shows a number of simple statistics about it.  These statistics include
       the  name  of the format, the number of sequences, the total number of residues, the average and range of
       the sequence lengths, the alignment length (e.g. including gap characters).

       Also shown are some percent identities. A percent pairwise alignment identity is  defined  as  (idents  /
       MIN(len1,  len2)) where idents is the number of exact identities and len1, len2 are the unaligned lengths
       of the two sequences. The "average percent identity", "most related pair", and "most unrelated  pair"  of
       the  alignment  are  the  average, maximum, and minimum of all (N)(N-1)/2 pairs, respectively.  The "most
       distant seq" is calculated by finding the maximum pairwise identity (best relative) for all N  sequences,
       then finding the minimum of these N numbers (hence, the most outlying sequence).

OPTIONS

       -a     Show  additional verbose information: a table with one line per sequence showing name, length, and
              its highest and lowest pairwise identity. These lines are prefixed with a *  character  to  enable
              easily  grep'ing  them out and sorting them. For example, alistat -a foo.slx | grep * | sort -n +3
              gives a ranked list of the most distant sequences in the  alignment.   Incompatible  with  the  -f
              option.

       -f     Fast;  use  a  sampling  method  to estimate the average %id.  When this option is chosen, alistat
              doesn't show the other three pairwise identity numbers.  This option  is  useful  for  very  large
              alignments, for which the full (N)(N-1) calculation of all pairs would be prohibitive (e.g. Pfam's
              GP120 alignment, with over 10,000 sequences). Incompatible with the -a option.

       -h     Print brief help; includes version number and summary of all options, including expert options.

       -q     be quiet - suppress the verbose header (program name, release number and date, the parameters  and
              options in effect).

       -B     (Babelfish). Autodetect and read a sequence file format other than the default (FASTA). Almost any
              common sequence file format is recognized (including  Genbank,  EMBL,  SWISS-PROT,  PIR,  and  GCG
              unaligned  sequence  formats,  and  Stockholm,  GCG  MSF,  and Clustal alignment formats). See the
              printed documentation for a complete list of supported formats.

EXPERT OPTIONS

       --informat <s>
              Specify that the sequence file is in format <s>, rather than the  default  FASTA  format.   Common
              examples  include  Genbank,  EMBL,  GCG,  PIR, Stockholm, Clustal, MSF, or PHYLIP; see the printed
              documentation for a complete list of accepted format names.  This  option  overrides  the  default
              format (FASTA) and the -B Babelfish autodetection option.

SEE ALSO

       afetch(1),  compalign(1),  compstruct(1),  revcomp(1),  seqsplit(1),  seqstat(1),  sfetch(1), shuffle(1),
       sindex(1), sreformat(1), stranslate(1), weight(1).

AUTHOR

       Biosquid and its documentation are Copyright (C) 1992-2003 HHMI/Washington University School of  Medicine
       Freely distributed under the GNU General Public License (GPL) See COPYING in the source code distribution
       for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu