xenial (1) parseblast.1.gz

Provided by: gff2aplot_2.0-8_amd64 bug

NAME

       parseblast - Filtering High-scoring Segment Pairs (HSPs) from WU/NCBI BLAST.

SYNOPSIS

       parseblast [options] <results.from.blast>

DESCRIPTION

       This manual page documents briefly the parseblast command.

       Different  output  options are available, the most important here are those allowing to write HSPs in GFF
       format (GFFv1, GFFv2 or APLOT). Sequences can be  included  in  the  GFF  records  as  a  comment  field.
       Furthermore, this script can output also the alignments for each HSP in ALN, MSF or tabular formats.

       NOTE  -  If first line from blast program output (the one containing which flavour has been run, say here
       BLASTN, BLASTP, BLASTX, TBLASTN or TBLASTX), is missing, the program assumes that it contains BLASTN  HSP
       records.  So that, ensure that you feed the parseblast script with a well formatted BLAST file. Sometimes
       there are no spaces between the HSP coords and its sequence, as it  sometimes  happens  in  Web-Blast  or
       Paracel-Blast outputs. Now those records are processed ok and that HSP is retrieved as well as "standard"
       ones.

       WARNING - Frame fields from GFF records generated with parseblast contain BLAST  frame  (".","1","2","3")
       instead  of  the  GFF  standard  values  (".","0","1","2").  As  the  frame  for  reverse  strand must be
       recalculated from the original sequence length, we suggest users to post-process the GFF output from this
       script  with a suitable filter that fix the frames (in case that the program that is going to use the GFF
       records will not work with the original BLAST frames). We provide the command-line option "--no-frame" to
       set frames to "." (meaning that there is no frame).

OPTIONS

       parseblast  prints  output  in  "HSP"  format  by  default  (see  below).  It takes input from <STDIN> or
       single/multiple files, and writes its output to <STDOUT>, so user can redirect to  a  file  but  he  also
       could  use  the  program  as a filter within a pipe.  "-N", "-M", "-P", "-G", "-F", "-A" and "-X" options
       (also the long name versions for each one) are mutually exclusive, and their precedence  order  is  shown
       above.

       GFF OPTIONS:

       -G, --gff
              Prints output in GFFv1 format.

       -F, --fullgff
              Prints output in GFFv2 "alignment" format ("target").

       -A, --aplot
              Prints output in pseudo-GFF APLOT "alignment" format.

       -S, --subject
              Projecting GFF output by SUBJECT (default by QUERY).

       -Q, --sequence
              Append query and subject sequences to GFF record.

       -b, --bit-score
              Set <score> field to Bits (default Alignment Score).

       -i, --identity-score
              Set <score> field to Identities (default Alignment).

       -s, --full-scores
              Include all scores for each HSP in each GFF record.

       -u, --no-frame
              Set all frames to "." (GFF for not available frames).

       -t, --compact-tags
              Target coords+strand+frame in short form (NO GFFv2!).

       ALIGNMENT OPTIONS:

       -P, --pairwise
              Prints pairwise alignment for each HSP in TBL format.

       -M, --msf
              Prints pairwise alignment for each HSP in MSF format.

       -N, --aln
              Prints pairwise alignment for each HSP in ALN format.

       -W, --show-coords
              Adds start/end positions to alignment output.

       GENERAL OPTIONS:

       -X, --expanded
              Expanded output (producing multiline output records).

       -c, --comments
              Include parameters from blast program as comments.

       -n, --no-comments
              Do not print "#" lines (raw output without comments).

       -v, --verbose
              Warnings sent to <STDERR>.

       --version
              Prints program version and exits.

       -h, --help
              Shows this help and exits.

OUTPUT FORMATS:

        "S_"  stands  for  "Subject_Sequence"  and "Q_" for "Query_Sequence". <Program> name is taken from input
       blast file. <Strands> are calculated from <start> and <end> positions on original blast file. <Frame>  is
       obtained  from  the  blast  file  if  is present else is set to ".". <SCORE> is set to Alignment Score by
       default, you can change it with "-b" and "-i".
        If "-S" or "--subject" options are given, then QUERY fields are referred to SUBJECT and  SUBJECT  fields
       are relative to QUERY (this only available for GFF output records).
        Dots ("...") mean that record description continues in the following line, but such record is printed as
       a single line record by parseblast.

       [HSP] <- (This is the DEFAULT OUTPUT FORMAT)
        <Program> <DataBase> : ...
          ... <IdentityMatches> <Min_Length> <IdentityScore> ...
          ... <AlignmentScore> <BitScore> <E_Value> <P_Sum> : ...
          ... <Q_Name> <Q_Start> <Q_End> <Q_Strand> <Q_Frame> : ...
          ... <S_Name> <S_Start> <S_End> <S_Strand> <S_Frame> : <S_FullDescription>

       [GFF]
        <Q_Name> <Program> hsp <Q_Start> <Q_End> <SCORE> <Q_Strand> <Q_Frame> <S_Name>

       [FULL GFF] <- (GFF showing alignment data)
        <Q_Name> <Program> hsp <Q_Start> <Q_End> <SCORE> <Q_Strand> <Q_Frame> ...
          ... Target "<S_Name>" <S_Start> <S_End> ...
          ... E_value <E_Value> Strand <S_Strand> Frame <S_Frame>

       [APLOT] <- (GFF format enhanced for APLOT program)
        <Q_Name>:<S_Name> <Program> hsp <Q_Start>:<S_Start> <Q_End>:<S_End> <SCORE> ...
          ... <Q_Strand>:<S_Strand> <Q_Frame>:<S_Frame> <BitScore>:<HSP_Number> ...
          ... # E_value <E_Value>

       [EXPANDED]
        MATCH(<HSP_Number>): <Q_Name> x <S_Name>
        SCORE(<HSP_Number>): <AlignmentScore>
        BITSC(<HSP_Number>): <BitScore>
        EXPEC(<HSP_Number>): <E_Value> Psum(<P_Sum>)
        IDENT(<HSP_Number>): <IdentityMatches>/<Min_Length> : <IdentityScore> %
        T_GAP(<HSP_Number>): <TotalGaps(BothSeqs)>
        FRAME(<HSP_Number>): <Q_Frame>/<S_Frame>
        STRND(<HSP_Number>): <Q_Strand>/<S_Strand>
        MXLEN(<HSP_Number>): <Max_Length>
        QUERY(<HSP_Number>): length <Q_Length> : gaps <Q_TotalGaps> : ...
          ... <Q_Start> <Q_End> : <Q_Strand> : <Q_Frame> : <Q_FullSequence>
        SBJCT(<HSP_Number>): length <S_Length> : gaps <S_TotalGaps> : ...
          ... <S_Start> <S_End> : <S_Strand> : <S_Frame> : <S_FullSequence>

SEE ALSO

       ali2gff(1), blat2gff(1), gff2aplot(1), sim2gff(1).

AUTHOR

       parseblast was written by Josep F. Abril <abril@imim.es>.

       This manual page was written by Nelson A. de Oliveira <naoliv@gmail.com>, for the Debian project (but may
       be used by others).

                                         Mon, 21 Mar 2005 21:44:15 -0300                           PARSEBLAST(1)