Provided by: pftools_3.2.12-1_amd64 bug

NAME

       psa - biological sequence alignment file format

DESCRIPTION

       psa  is  an  output  format  used  by  the  pftools package to describe alignments between
       biological sequences (DNA or protein) and PROSITE profiles.

       psa is apparented to the widely used biological sequence file format fasta.   Nevertheless
       it  does  not  only  describe  a  biological  sequence,  it  is especially used to include
       information of alignments between a motif descriptor like a PROSITE profile  and  a  given
       sequence. This information is included in the header and reflected in the structure of the
       sequence following the header line.

SYNTAX

       Each sequence in a psa alignment file or output must be preceded by a fasta header line.
       The general syntax of such a fasta header line is as follows:

              >seq_id [ free_text ]

       The header must start with a '>' character which is directly followed by the seq_id field.
       This  field  is interpreted by most programs as the sequence's identifier and/or accession
       number. It ends at the first encountered whitespace character.
       The pftools programs will use the free_text to add  information  about  the  match  score,
       position  and  description  of the sequence or motif.  Please refer to the man page of the
       corresponding programs for further information about the output formats.
       The header can only extend over one line. The following lines up to a  new  line  starting
       with a '>' character or the end of the file are interpreted as sequence data.

       The  line following the header, starts the alignment data between a sequence and a PROSITE
       profile. This data can span over several lines of different length.
       The data is formed by  upper  or  lower-case  characters  of  the  corresponding  sequence
       alphabet (DNA or protein).  The gap characters '.' and '-' are also supported.
       The  alignment  always  has  at  least  the  length of the matching profile. Insertions or
       deletions detected during the motif/sequence alignment step will vary the  length  of  the
       data reported, and can be identified using the following conventions:

              upper-case character
                     Any  upper-case  character  of  the  sequence  alphabet  identifies  a match
                     position between the sequence and the motif descriptor.

              lower-case character
                     A lower-case character of the sequence alphabet  is  used  to  symbolize  an
                     insertion in the sequence compared to the motif descriptor.

              '-' (dash) character
                     A  '-'  character in the output identifies the presence of a deletion in the
                     sequence compared to the motif descriptor.

EXAMPLES

       (1)    >YD28_SCHPO 556 pos. 291 - 332 sp|Q10256|YD28_SCHPO

              PTDPGlnsKIAQLVSMGFDPLEAAQALDAANGDLDVAASFLL--
              This is an example of the output produced by pfsearch(1) using the '-x' (i.e.   psa
              output) option. The first line starting with the '>' character is the fasta header.
              It also contains information about the raw score of the alignment as  well  as  its
              position in the input sequence.
              On the next line you find the alignment proper. Starting at position 6, we can find
              an insertion of the 'lns' residues in the sequence compared to the motif. The  last
              two positions of the motif are not present in the sequence (i.e. they are deleted).
              This is indicated by the presence of two '-' (dash) characters at the  end  of  the
              alignment.

NOTES

       (1)    The  xpsa(5)  format  defines a more strict syntax of the header line, allowing the
              exchange  of  information  between  different  sequence  analysis  tools.  It  uses
              keyword=value  pairs  to  annotate the current match between a sequence and a motif
              descriptor. This syntax can be easily parsed and extended, according to  the  needs
              of bioinformatic tools.

       (2)    The  current  implementation  of  the  pftools  package does not use the '.'  (dot)
              character in the psa output. Nevertheless psa2msa(1) will read it and interpret  it
              in the same manner as the '-' (dash) character.

SEE ALSO

       xpsa(5), pfsearch(1), pfscan(1), pfw(1), pfmake(1), psa2msa(1)

AUTHOR

       This manual page was originally written by Volker Flegel.
       The pftools package was developed by Philipp Bucher.
       Any comments or suggestions should be addressed to <pftools@sib.swiss>.