Ubuntu Manpage: QTLtools quan - Quantify gene and exon expression from RNA-seq

Provided by: qtltools_1.3.1+dfsg-4_amd64

NAME

       QTLtools quan - Quantify gene and exon expression from RNA-seq

SYNOPSIS

       QTLtools  quan --bam [in.sam|in.bam|in.cram] --gtf gene_annotation.gtf --out-prefix output
       [OPTIONS]

DESCRIPTION

       This mode quantifies the expression of genes and exons in the provided  --gtf  file  using
       the  RNA-seq  reads  in the --bam file.  The method counts the number of reads overlapping
       the exons in the --gtf file.  Firstly all exons of a gene are  converted  into  meta-exons
       where  overlapping  exons  are  merged into a single exon encompassing all the overlapping
       exons.  Any overlap between the read and the exon is considered a match, that is a read is
       not  required  to  be  in between start and end positions of an exon to count towards that
       exon's quantification.  Split reads aligning to multiple exons contribute to each exon  it
       overlaps  with  based  on  the fraction of the read that overlaps with a given exon.  Thus
       split reads contribute less than a single count to each of the overlapping  exons.   Reads
       aligning  to  multiple  exons (i.e. overlapping exons of multiple genes) count towards the
       quantification of all the exons that it overlaps with.  If the --bam file contains paired-
       end  reads  and  if there are cases where the two mate pairs overlap with each other (i.e.
       have an insert size < 0), then each of these reads contribute less  then  a  single  count
       towards  the  quantifications  unless --no-merge is provided.  The following diagram, with
       two genes with overlapping exons and one paired-end read where both mate pairs  are  split
       reads and overlap with each other, illustrates how the quantification works:

                    x             x
                   / \           / \
        +---------+   +---------+   +---------+
        | Exon1|1 |   | Exon1|2 |   | Exon1|3 |      Gene1
        +---------+   +---------+   +---------+
                                  x
                                 / \
                   +------------+   +-------------+
                   |  Exon2|1   |   |   Exon2|2   |  Gene2
                   +------------+   +-------------+
                                     x
                                    / \
                      +------------+   +----+        RNAseq Read Mate1

                      |--a-||-b-||c|   |-d--||--e-|
                                     x
                                    / \
                            +------+   +----------+  RNAseq Read Mate2

         Left Mate1  = ((b * 0.5) + a) / (a + b + d)
         Right Mate1 = (d * 0.5)/(a + b + d)
         Left Mate2  = (b * 0.5)/(b + d + e)
         Right Mate2 = ((d * 0.5) + e)/(b + d + e)

         Exon1|2 = Left Mate1 + Left Mate2
         Exon1|3 = Right Mate1 + Right Mate2
         Exon2|1 = Left Mate1 + Left Mate2
         Exon2|2 = Right Mate2 + Right Mate2

         Gene1 = Exon1|2 + Exon1|3
         Gene2 = Exon2|1 + Exon2|2

       The  quan  mode  in  version  1.2  and  above  is  not compatible with the quantifications
       generated by the previous versions. This due to bug fixes and slight  adjustments  to  the
       way we quantify. DO NOT MIX QUANTIFICATIONS GENERATED BY EARLIER VERSIONS OF QTLTOOLS WITH
       QUANTIFICATIONS FROM VERSION 1.2 AND ABOVE AS THIS WILL CREATE A BIAS IN YOUR DATASET.

OPTIONS

--gtf gene_annotation.gtf
Gene annotations in GTF format. These can be obtained from
<https://www.gencodegenes.org/>. REQUIRED.

--bam [in.bam|in.sam|in.cram]
Sequence data in BAM/SAM/CRAM format sorted by chromosome and then position. One
sample per BAM file. REQUIRED.

--out-prefix output
Output prefix. REQUIRED.

--sample sample_name
The sample name of the BAM file. If not provided the sample name will be taken as
the BAM file path.

--rpkm Output RPKM values.

--tpm Output TPM values.

--xxhash
Rather than using the GTF file name to generate unique hash for the options used,
use the hash of the GTF file.

--no-hash
Do not include a hash signifying the options used in the quantification in the
output file names. NOT RECOMMENDED.

--gene-types gene_type ...
Only quantify these gene types. Requires gene_type attribute in GTF. It will also
use transcript_type if present.

--filter-mapping-quality integer
Minimum mapping quality for a read or read pair to be considered. Set this to only
include uniquely mapped reads. DEFAULT=10.

--filter-mismatch integer|float
Maximum mismatches allowed in a read. If between 0 and 1 taken as the fraction of
read length. Requires NM attribute in the BAM file. DEFAULT=OFF.

--filter-mismatch-total integer|float
Maximum total mismatches allowed in paired-reads. If between 0 and 1 taken as the
fraction of combined read length. Requires NM attribute in the BAM file.
DEFAULT=OFF.

--filter-min-exon integer
Minimum length of an exon for it to be quantified. Exons smaller than this will
not be printed out in the exon quantifications, but will still count towards gene
quantifications. DEFAULT=0.

--filter-remove-duplicates
Remove duplicate sequencing reads,as indicated by the aligner, in the process. NOT
RECOMMENDED.

--filter-failed-qc
Remove fastq reads that fail sequencing QC as indicated by the sequencer.

--check-proper-pairing
If provided only properly paired reads according to the aligner that are in correct
orientation will be considered. Otherwise all pairs in correct orientation will be
considered.

--check-consistency
If provided checks the consistency of split reads with annotation, rather than pure
overlap of one of the blocks of the split read.

--no-merge
If provided overlapping mate pairs will not be merged. Default behavior is to
merge overlapping mate pairs based on the amount of overlap, such that each mate
pair counts for less than 1 read.

--legacy-options
Exactly replicate Dermitzakis lab original quantification script. DO NOT USE.

--region chr:start-end
Genomic region to be processed. E.g. chr4:12334456-16334456, or chr5.

OUTPUT FILES

       Unless --no-hash is provided, all output files will include a hash value corresponding  to
       combination  of  the  specific  options  used.   This  is given so that one does not merge
       quantifications from samples that were quantified differently, which would create  a  bias
       in the dataset.

       .gene.count.bed .exon.count.bed .gene.rpkm.bed .exon.rpkm.bed .gene.tpm.bed .exon.tpm.bed
        These are the quantification results files with the following columns:

        1   chr           Phenotype's chromosome
        2   start         Phenotype's start position (0-based)
        3   end           Phenotype's end position (1-based)
        4   gene|exon     The gene or exon ID.
        5   info|geneID   Information  about  the gene or the gene ID of the exon.  The gene info
                          is separated by semicolons, and  L=gene  length,  T=gene  type,  R=gene
                          positions, N=gene name
        6   strand        Phenotype's strand
        7   sample_name   The sample name of the BAM file

       .stats
        Details the statistics of the quantification, with the following rows:

         1   filtered_secondary_alignments_(does_not_count_towards_total_reads)   Number       of
                                                                                  secondary
                                                                                  alignments
         2   total_reads                                                          Number of reads
                                                                                  in the BAM file
         3   filtered_unmapped                                                    Number       of
                                                                                  unmapped reads
         4   filtered_failqc                                                      Number of reads
                                                                                  with the failed
                                                                                  QC tag
         5   filtered_duplicate                                                   Number       of
                                                                                  duplicate reads
         6   filtered_mapQ_less_than_X                                            Number of reads
                                                                                  below       the
                                                                                  mapping quality
                                                                                  threshold X
         7   filtered_notpaired                                                   Number of pairs
                                                                                  that  were  not
                                                                                  in  the correct
                                                                                  orientation  or
                                                                                  were        not
                                                                                  properly paired

         8   filtered_mismatches_greater_than_X_Y                                 Number of reads
                                                                                  failing     the
                                                                                  mismatches  per
                                                                                  read,   X,  and
                                                                                  mismatches
                                                                                  total  filters,
                                                                                  Y
         9   filtered_unmatched_mate_pairs                                        Number of reads
                                                                                  where there was
                                                                                  a   paired-read
                                                                                  with  a missing
                                                                                  mate
        10   total_good                                                           Number of reads
                                                                                  that passed all
                                                                                  filters
        11   total_exonic                                                         Number of reads
                                                                                  that aligned to
                                                                                  exons       and
                                                                                  passed      all
                                                                                  filters
        12   total_exonic_multi_counting                                          Number of reads
                                                                                  that aligned to
                                                                                  exons  when  we
                                                                                  count     reads
                                                                                  that  align  to
                                                                                  multiple  exons
                                                                                  multiple times
        13   total_merged_reads                                                   Number of reads
                                                                                  where  the mate
                                                                                  pairs      were
                                                                                  overlapping and
                                                                                  thus       were
                                                                                  merged
        14   total_exonic_multi_counting_after_merge_(used_for_rpkm)              Number of reads
                                                                                  that aligned to
                                                                                  exons  when  we
                                                                                  merge
                                                                                  overlapping
                                                                                  mate pairs
        15   good_over_total                                                      Number of  good
                                                                                  reads  over the
                                                                                  total number of
                                                                                  reads
        16   exonic_over_total                                                    Number       of
                                                                                  exonic    reads
                                                                                  over  the total
                                                                                  number of reads
        17   exonic_over_good                                                     Number       of
                                                                                  exonic    reads
                                                                                  over the number
                                                                                  of good reads

EXAMPLE

       o Quantifying  a  sample  mapped  with GEM, outputting TPM and RPKM values, and taking the
         hash of the GTF file:

         QTLtools quan --bam HG00381.chr22.bam --gtf  gencode.v19.annotation.chr22.gtf.gz  --out-
         prefix   HG00381  --sample  HG00381  --rpkm  --tpm  --xxhash  --filter-mismatch-total  8
         --filter-mapping-quality 150

BUGS

       Please submit bugs to <https://github.com/qtltools/qtltools>

CITATION

       Delaneau, O., Ongen, H., Brown, A. et al. A complete tool set for molecular QTL  discovery
       and analysis. Nat Commun 8, 15452 (2017).  <https://doi.org/10.1038/ncomms15452>

AUTHORS

       Halit Ongen (halitongen@gmail.com), Olivier Delaneau (olivier.delaneau@gmail.com)