lunar (1) parseblast.1.gz

Provided by: gff2aplot_2.0-14_amd64 bug

NAME

       parseblast - Filtering High-scoring Segment Pairs (HSPs) from WU/NCBI BLAST.

SYNOPSIS

       parseblast [options] <results.from.blast>

DESCRIPTION

       This manual page documents briefly the parseblast command.

       Different  output  options  are  available,  the most important here are those allowing to
       write HSPs in GFF format (GFFv1, GFFv2 or APLOT). Sequences can be  included  in  the  GFF
       records  as  a  comment field. Furthermore, this script can output also the alignments for
       each HSP in ALN, MSF or tabular formats.

       NOTE - If first line from blast program output (the one containing which flavour has  been
       run, say here BLASTN, BLASTP, BLASTX, TBLASTN or TBLASTX), is missing, the program assumes
       that it contains BLASTN HSP records. So that, ensure that you feed the  parseblast  script
       with a well formatted BLAST file. Sometimes there are no spaces between the HSP coords and
       its sequence, as it sometimes happens in Web-Blast or  Paracel-Blast  outputs.  Now  those
       records are processed ok and that HSP is retrieved as well as "standard" ones.

       WARNING  -  Frame  fields  from  GFF records generated with parseblast contain BLAST frame
       (".","1","2","3") instead of the GFF standard values (".","0","1","2"). As the  frame  for
       reverse strand must be recalculated from the original sequence length, we suggest users to
       post-process the GFF output from this script with a suitable filter that  fix  the  frames
       (in  case  that  the  program  that is going to use the GFF records will not work with the
       original BLAST frames). We provide the command-line option "--no-frame" to set  frames  to
       "." (meaning that there is no frame).

OPTIONS

       parseblast  prints  output  in  "HSP"  format  by default (see below). It takes input from
       <STDIN> or single/multiple files, and writes its output to <STDOUT>, so user can  redirect
       to  a file but he also could use the program as a filter within a pipe.  "-N", "-M", "-P",
       "-G", "-F", "-A" and "-X" options (also the long name versions for each one) are  mutually
       exclusive, and their precedence order is shown above.

       GFF OPTIONS:

       -G, --gff
              Prints output in GFFv1 format.

       -F, --fullgff
              Prints output in GFFv2 "alignment" format ("target").

       -A, --aplot
              Prints output in pseudo-GFF APLOT "alignment" format.

       -S, --subject
              Projecting GFF output by SUBJECT (default by QUERY).

       -Q, --sequence
              Append query and subject sequences to GFF record.

       -b, --bit-score
              Set <score> field to Bits (default Alignment Score).

       -i, --identity-score
              Set <score> field to Identities (default Alignment).

       -s, --full-scores
              Include all scores for each HSP in each GFF record.

       -u, --no-frame
              Set all frames to "." (GFF for not available frames).

       -t, --compact-tags
              Target coords+strand+frame in short form (NO GFFv2!).

       ALIGNMENT OPTIONS:

       -P, --pairwise
              Prints pairwise alignment for each HSP in TBL format.

       -M, --msf
              Prints pairwise alignment for each HSP in MSF format.

       -N, --aln
              Prints pairwise alignment for each HSP in ALN format.

       -W, --show-coords
              Adds start/end positions to alignment output.

       GENERAL OPTIONS:

       -X, --expanded
              Expanded output (producing multiline output records).

       -c, --comments
              Include parameters from blast program as comments.

       -n, --no-comments
              Do not print "#" lines (raw output without comments).

       -v, --verbose
              Warnings sent to <STDERR>.

       --version
              Prints program version and exits.

       -h, --help
              Shows this help and exits.

OUTPUT FORMATS:

        "S_" stands for "Subject_Sequence" and "Q_" for "Query_Sequence". <Program> name is taken
       from input blast file. <Strands> are  calculated  from  <start>  and  <end>  positions  on
       original  blast file. <Frame> is obtained from the blast file if is present else is set to
       ".". <SCORE> is set to Alignment Score by default, you can change it with "-b" and "-i".
        If "-S" or "--subject" options are given, then QUERY fields are referred to  SUBJECT  and
       SUBJECT fields are relative to QUERY (this only available for GFF output records).
        Dots  ("...")  mean  that  record  description  continues in the following line, but such
       record is printed as a single line record by parseblast.

       [HSP] <- (This is the DEFAULT OUTPUT FORMAT)
        <Program> <DataBase> : ...
          ... <IdentityMatches> <Min_Length> <IdentityScore> ...
          ... <AlignmentScore> <BitScore> <E_Value> <P_Sum> : ...
          ... <Q_Name> <Q_Start> <Q_End> <Q_Strand> <Q_Frame> : ...
          ... <S_Name> <S_Start> <S_End> <S_Strand> <S_Frame> : <S_FullDescription>

       [GFF]
        <Q_Name> <Program> hsp <Q_Start> <Q_End> <SCORE> <Q_Strand> <Q_Frame> <S_Name>

       [FULL GFF] <- (GFF showing alignment data)
        <Q_Name> <Program> hsp <Q_Start> <Q_End> <SCORE> <Q_Strand> <Q_Frame> ...
          ... Target "<S_Name>" <S_Start> <S_End> ...
          ... E_value <E_Value> Strand <S_Strand> Frame <S_Frame>

       [APLOT] <- (GFF format enhanced for APLOT program)
        <Q_Name>:<S_Name> <Program> hsp <Q_Start>:<S_Start> <Q_End>:<S_End> <SCORE> ...
          ... <Q_Strand>:<S_Strand> <Q_Frame>:<S_Frame> <BitScore>:<HSP_Number> ...
          ... # E_value <E_Value>

       [EXPANDED]
        MATCH(<HSP_Number>): <Q_Name> x <S_Name>
        SCORE(<HSP_Number>): <AlignmentScore>
        BITSC(<HSP_Number>): <BitScore>
        EXPEC(<HSP_Number>): <E_Value> Psum(<P_Sum>)
        IDENT(<HSP_Number>): <IdentityMatches>/<Min_Length> : <IdentityScore> %
        T_GAP(<HSP_Number>): <TotalGaps(BothSeqs)>
        FRAME(<HSP_Number>): <Q_Frame>/<S_Frame>
        STRND(<HSP_Number>): <Q_Strand>/<S_Strand>
        MXLEN(<HSP_Number>): <Max_Length>
        QUERY(<HSP_Number>): length <Q_Length> : gaps <Q_TotalGaps> : ...
          ... <Q_Start> <Q_End> : <Q_Strand> : <Q_Frame> : <Q_FullSequence>
        SBJCT(<HSP_Number>): length <S_Length> : gaps <S_TotalGaps> : ...
          ... <S_Start> <S_End> : <S_Strand> : <S_Frame> : <S_FullSequence>

SEE ALSO

       ali2gff(1), blat2gff(1), gff2aplot(1), sim2gff(1).

AUTHOR

       parseblast was written by Josep F. Abril <abril@imim.es>.

       This manual page was written by Nelson A. de Oliveira <naoliv@gmail.com>, for  the  Debian
       project (but may be used by others).

                                 Mon, 21 Mar 2005 21:44:15 -0300                    PARSEBLAST(1)