Ubuntu Manpage: bamfeaturecount - evaluate alignments produce by an RNA-seq aligner

Provided by: biobambam2_2.0.183+ds-1_amd64

NAME

       bamfeaturecount - evaluate alignments produce by an RNA-seq aligner

SYNOPSIS

       bamfeaturecount annotation.gtf mapped.bam [options]

DESCRIPTION

       bamfeaturecount  evaluates  the alignments produced by an RNA-seq aligner (like e.g. STAR)
       and outputs the average coverage found for  each  transcript  and  gene.  It  requires  an
       annotation file in the GTF format in addition to a BAM file containing RNA-seq alignments.
       The annotation file needs to have been processed by the  filtergtf  program  to  ensure  a
       sorting which is compatible with bamfeaturecounts expectations.

       The  output  contains  two  types  of  lines,  one  starting with [transcript] and another
       starting with [gene]. In both cases the information provided is given as a  tab  separated
       set of columns.

       The following is an example of a transcript line:

       [transcript]        chr1:C1orf159       ENST00000472741.5           (0.0618557,5.38144,97)
       (0.0496454,11.305,564)  (0.0514372,10.4357,661) [1116059,1116087)        [1091990,1092103)
       [1091045,1091565)

       transcript  lines have always at least 6 columns. The second column provides the reference
       sequence name (as given in the BAM header and GTF file) and the name of the gen  concerned
       separated  by  a colon (here chr1 and C1orf159).  The third column contains the transcript
       identifier (transcript_id in the GTF file). Column 4,  5  and  6  each  contain  either  a
       triplet  of  numbers  (A,B,C)  or  the  symbol *. Column 4 contains regions unique to this
       transcript (i.e.  stretches on the genome not shared with any other transcript). Column  5
       contains  regions  shared  by  at  least  one  other  transcript.  Column  6  contains the
       information for all regions covered by the  transcript  (unique  and  non  unique).  If  a
       transcript  does  not have any respective intervals (i.e. if every base is also covered by
       at least one other transcript or all bases are unique to this transcript) then the  column
       contains  the  symbol *. Otherwise a triplet (A,B,C) is given where A denotes the fraction
       of bases not covered by any alignment (in the example 0.0618557 or 6% of the bases  unique
       to  this  transcript are not covered), B contains the average coverage (in the example the
       average sequencing depth on the unique bases for this transcript is 5.38) and C the  total
       number  of bases (in the example 97 bases of this transcript are not shared with any other
       transcript). The rest of the columns (> 6) contain the zero bases intervals of  exons  for
       this transcript on the reference sequence.

       The gene lines should be disregarded in the current version of the program.

       The following key=value pairs can be given:

       T=<filename>: set the prefix for temporary file names

       verbose=<1>: print some progress report while processing

       threads=<1>:  number  of  threads  used  for  processing.  Set  this to 0 to use all cores
       detected on the machine.

       mapqmin=<255>: Minimum mapping quality allowed for alignments considered. By default  this
       is 255 (the value used by STAR to mark uniquely mapped reads).

       mapqmax=<255>:  Maximum mapping quality allowed for alignments considered. By default this
       is 255 (the value used by STAR to mark uniquely mapped reads).

       uncoveredthres=<0.1>: maximum fraction of bases allowed to be uncovered in a transcript so
       the  transcript will be reported (i.e. minimum value allowed for the first number given in
       column 6 of the transcript output).

       uniqueuncoveredthres=<0.1>: maximum fraction of bases  allowed  to  be  uncovered  in  the
       unique  region  of  a  transcript  so  the transcript will be reported (i.e. minimum value
       allowed for the first number given in column 4 of the transcript output).

       exclude=<SECONDARY>: Do not include reads in the output that have any of the  given  flags
       set. The flags are given separated by commas. Valid flags are:

       PAIRED:
              read was paired in sequencing

       PROPER_PAIR:
              read has been mapped as part of a proper pair

       UNMAP: read was not mapped

       MUNMAP:
              mate of read was not mapped

       REVERSE:
              read was mapped to the reverse strand

       MREVERSE:
              mate of read was mapped to the reverse strand

       READ1: read was first read of a pair during sequencing

       READ2: read was second read of a pair during sequencing

       SECONDARY:
              alignment is secondary, i.e. an alternative mapping to the primary alignment in the
              same file

       QCFAIL:
              read as marked as having failed quality control

       DUP:   read  is  marked  as  a  duplicate  of  another  read  in  the   same   file   (see
              bammarkduplicates)

       SUPPLEMENTARY:
              read is marked as supplementary alignment

       exportcdna=<0>:  instead  of feature counting generate a FastA file containing the CDNA as
       designated by the GTF annotation file.  The  second  parameter  (BAM  file)  needs  to  be
       specified,  but will not be read. This option requires a reference FastA file suitable for
       the GTF file to be provided via the reference key.

       reference=<>: name of a reference FastA file. This is required for the exportcdna option.

AUTHOR

       Written by German Tischler-Höhle.

REPORTING BUGS

       Report bugs to <germant@miltenyibiotec.de>

COPYRIGHT

       Copyright © 2009-2019 German Tischler-Höhle, © 2011-2014 Genome Research Limited.  License
       GPLv3+: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
       This  is free software: you are free to change and redistribute it.  There is NO WARRANTY,
       to the extent permitted by law.