Ubuntu Manpage: pp_simScore - print similarity and alignments for block-profile and protein sequence on

Provided by: augustus_3.5.0+dfsg-3build1_amd64

NAME

       pp_simScore - print similarity and alignments for block-profile and protein sequence on
       the standard output

SYNOPSIS

       pp_simScore [OPTIONS] --fasta=protein-sequence-file --prfl=protein-profile-file

DESCRIPTION

       Algorithm for calculating the similarity score and the optimal alignments of a
       block-profile and a protein sequence. The algorithm can optional take intron positions
       into account. Print to standard output.

OPTIONS

   Mandatory options
       -f, --fasta=file
           Protein sequence file in FASTA format.
           It may contain an optional [Intron] section. This section denotes the intron positions
           in the protein sequence, which are specified as list of (j, f), where j is the index
           of the amino acid after witch the intron immediately occurs. The indices range from 0
           to m - 1 if the protein sequence has a length of m.

               >protein sequence header
               XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
               XXXXXXX protein sequence XXXXXXXXXXX
               XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

               [Introns]
               # index of the position after which an intron occurs | residual nucleotides before the intron
               2 0
               5 1
               30 2
               104 1

       -p, --prfl= file
           The block profile file has to have following structure:

               [dist]
               min    max
               [block]
               B
               [intron profile]
               w
               inter-block_profile_list
               intra-block_profile_list

           This structure can be repeated in this file. The file has to end either
           in a [dist] section or a [dist] and than [intron profile] section.
           The [intron profile] sections are optional.

               [dist] explanation:
               min, max denote the distance interval of an inter-block section

               [block] explanation:
               B denotes a (20 x t) matrix for a block of t of the block-profile

               [intron profile] explanation:
               an intron profile describes the positions and frequencies of introns in and
               before the associated block
               w: number of protein family members used to build the intron profile
               inter-block_profile_list: list of (h, v),
               where   h denotes the number of introns which occurred within a family member,
                       v the number of family members which have this number of introns
               intra-block_profile_list: list (s, f, v),
               where   s denotes the index of the position in the block after which an intron occurs,
                       f denotes the number of nucleotides which are left before the intron (0,1,2)
                       v the number of family members which have an intron at that position

   Additional options
       -g, --gap_inter=float
           Gap costs for an alignment column that is a gap in an inter-block section. Default
           Value: -5.0

       -b, --gap_intra=float
           Gap costs for an alignment column that is a gap in a block. Default Value: -50.0

       -r, --gap_intron=float
           Gap costs for an gap in intron positions. Default Value: -5.0

       -e, --epsilon_intron=float
           Pseudocount parameter epsilon1, the pseudocount is added to a relative intron
           frequency v/w with (v+epsilon1)/(w+epsilon1+epsilon2). Default Value: 0.0000001

       -n, --epsilon_noIntron=float
           Pseudocount parameter epsilon2, the pseudocount is added to a relative intron
           frequency v/w with (v+epsilon1)/(w+epsilon1+epsilon2). Default Value: 0.1

       -i, --intron_weight_intra=float
           Value that is added to an intron score for a match of intron positions in a block.
           Default Value: 5.0

       -t, --intron_weight_inter=float
           Value that is added to an intron score for a match of intron positions in an
           inter-block. Default Value: 5.0

       -a, --alignment=number
           Number of optimal alignments that are computed. Default Value: 1

       -o, --out=format

           Denotes the output format, the following output options are implemented:

               score
                   output is the similarity score

               matrix
                   output are similarity matrix and similarity score

               alignment
                   output are the computed alignments as

                   •   Alignment representation of P as symbols of {AminoAcid, gap symbol or
                       number of amino acids in inter-block}

                   •   Alignment representation of argmax of B as symbols of {argmax AminoAcid
                       for aligned block column, gap symbol or inter-block length}

                   •   Frequency of amino acid of P in aligned block column of B, if alignment
                       type is a match

               matrix+alignment
                   output are similarity matrix, similarity score and the computed alignments in
                   the format described above

               db
                   output are the computed alignment as list of alignment frames, an element of
                   the list consists of:

                   •   starting position of the first amino acid of the protein sequence that is
                       included in the alignment frame

                   •   block number in which the alignment frame is located

                   •   index of the first block column that is included in the alignment frame

                   •   length of the frame (number of alignment columns)

                   •   alignment type: 'm', 's'. 'p' or '-'

               bp
                   output is a list of translations from the index of a block to the number of
                   the block in the .prfl file

               consents
                   output is the average of the argmax of the block columns for the complete
                   profile

               interblock
                   output is a list of all inter-block distance intervals

                   :
                       Default Value: score

       -h, --help
           Produce help message.

EXAMPLE

           pp_simScore --fasta=EDW03868.1.fa --prfl=EOG09150290.prfl --out=alignment

AUTHORS

       AUGUSTUS was written by M. Stanke, O. Keller, S. König, L. Gerischer, L. Romoth and
       L.Gabriel.

                                                                                   PP_SIMSCORE(1)