Ubuntu Manpage: bamfeaturecount - evaluate alignments produce by an RNA-seq aligner

Provided by: biobambam2_2.0.185+ds-2_amd64

NAME

       bamfeaturecount - evaluate alignments produce by an RNA-seq aligner

SYNOPSIS

       bamfeaturecount annotation.gtf mapped.bam [options]

DESCRIPTION

       bamfeaturecount  evaluates the alignments produced by an RNA-seq aligner (like e.g. STAR) and outputs the
       average coverage found for each transcript and gene. It requires an annotation file in the GTF format  in
       addition to a BAM file containing RNA-seq alignments. The annotation file needs to have been processed by
       the filtergtf program to ensure a sorting which is compatible with bamfeaturecounts expectations.

       The  output contains two types of lines, one starting with [transcript] and another starting with [gene].
       In both cases the information provided is given as a tab separated set of columns.

       The following is an example of a transcript line:

       [transcript]    chr1:C1orf159    ENST00000472741.5        (0.0618557,5.38144,97)   (0.0496454,11.305,564)
       (0.0514372,10.4357,661) [1116059,1116087)       [1091990,1092103)       [1091045,1091565)

       transcript  lines  have always at least 6 columns. The second column provides the reference sequence name
       (as given in the BAM header and GTF file) and the name of the gen concerned separated by  a  colon  (here
       chr1 and C1orf159).  The third column contains the transcript identifier (transcript_id in the GTF file).
       Column  4,  5  and  6 each contain either a triplet of numbers (A,B,C) or the symbol *. Column 4 contains
       regions unique to this transcript (i.e.  stretches on the genome not shared with any  other  transcript).
       Column  5 contains regions shared by at least one other transcript. Column 6 contains the information for
       all regions covered by the transcript (unique and  non  unique).  If  a  transcript  does  not  have  any
       respective  intervals  (i.e.  if every base is also covered by at least one other transcript or all bases
       are unique to this transcript) then the column contains the symbol *.  Otherwise  a  triplet  (A,B,C)  is
       given  where A denotes the fraction of bases not covered by any alignment (in the example 0.0618557 or 6%
       of the bases unique to this transcript are not covered), B contains the average coverage (in the  example
       the  average  sequencing depth on the unique bases for this transcript is 5.38) and C the total number of
       bases (in the example 97 bases of this transcript are not shared with any other transcript). The rest  of
       the  columns  (>  6)  contain  the  zero  bases  intervals  of exons for this transcript on the reference
       sequence.

       The gene lines should be disregarded in the current version of the program.

       The following key=value pairs can be given:

       T=<filename>: set the prefix for temporary file names

       verbose=<1>: print some progress report while processing

       threads=<1>: number of threads used for processing. Set this to 0  to  use  all  cores  detected  on  the
       machine.

       mapqmin=<255>:  Minimum  mapping  quality  allowed for alignments considered. By default this is 255 (the
       value used by STAR to mark uniquely mapped reads).

       mapqmax=<255>: Maximum mapping quality allowed for alignments considered. By default  this  is  255  (the
       value used by STAR to mark uniquely mapped reads).

       uncoveredthres=<0.1>: maximum fraction of bases allowed to be uncovered in a transcript so the transcript
       will  be  reported  (i.e.  minimum value allowed for the first number given in column 6 of the transcript
       output).

       uniqueuncoveredthres=<0.1>: maximum fraction of bases allowed to be uncovered in the unique region  of  a
       transcript  so  the transcript will be reported (i.e. minimum value allowed for the first number given in
       column 4 of the transcript output).

       exclude=<SECONDARY>: Do not include reads in the output that have any of the given flags set.  The  flags
       are given separated by commas. Valid flags are:

       PAIRED:
              read was paired in sequencing

       PROPER_PAIR:
              read has been mapped as part of a proper pair

       UNMAP: read was not mapped

       MUNMAP:
              mate of read was not mapped

       REVERSE:
              read was mapped to the reverse strand

       MREVERSE:
              mate of read was mapped to the reverse strand

       READ1: read was first read of a pair during sequencing

       READ2: read was second read of a pair during sequencing

       SECONDARY:
              alignment is secondary, i.e. an alternative mapping to the primary alignment in the same file

       QCFAIL:
              read as marked as having failed quality control

       DUP:   read is marked as a duplicate of another read in the same file (see bammarkduplicates)

       SUPPLEMENTARY:
              read is marked as supplementary alignment

       exportcdna=<0>:  instead  of  feature counting generate a FastA file containing the CDNA as designated by
       the GTF annotation file. The second parameter (BAM file) needs to be specified, but  will  not  be  read.
       This  option  requires  a reference FastA file suitable for the GTF file to be provided via the reference
       key.

       reference=<>: name of a reference FastA file. This is required for the exportcdna option.

AUTHOR

       Written by German Tischler-Höhle.

REPORTING BUGS

       Report bugs to <germant@miltenyibiotec.de>

COPYRIGHT

       Copyright © 2009-2019 German Tischler-Höhle, © 2011-2014 Genome Research Limited.   License  GPLv3+:  GNU
       GPL version 3 <http://gnu.org/licenses/gpl.html>
       This  is  free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent
       permitted by law.

BIOBAMBAM                                          August 2019                                BAMFEATURECOUNT(1)