lunar (1) QTLtools.1.gz

Provided by: qtltools_1.3.1+dfsg-4_amd64 bug

NAME

       QTLtools - A complete tool set for molecular QTL discovery and analysis

SYNOPSIS

       QTLtools [MODE] [OPTIONS]

DESCRIPTION

       QTLtools  is  a  complete  tool set for molecular QTL discovery and analysis that is fast,
       user and cluster friendly.  QTLtools performs multiple key  tasks  such  as  checking  the
       quality  of the sequence data, checking that sequence and genotype data match, quantifying
       and stratifying individuals using molecular phenotypes,  discovering  proximal  or  distal
       molQTLs  and  integrating  them  with  functional  annotations or GWAS data, and analyzing
       allele specific expression.  It utilizes HTSlib <http://www.htslib.org/>  to  quickly  and
       efficiently  handle  common  genomics  files types like VCF, BCF, BAM, SAM, CRAM, BED, and
       GTF, and the Eigen C++ library <http://eigen.tuxfamily.org/> for fast linear algebra.

MODES

       bamstat      QTLtools bamstat --bam [in.sam|in.bam|in.cram] --bed annotation.bed.gz  --out
                    output.txt [OPTIONS]

                    Calculate basic QC metrics for BAM/SAM.

       mbv          QTLtools  mbv  --bam  [in.sam|in.bam|in.cram] --vcf [in.vcf|in.vcf.gz|in.bcf]
                    --out output.txt [OPTIONS]

                    Match BAM to VCF

       pca          QTLtools  pca  --vcf  [in.vcf|in.vcf.gz|in.bcf]  |  --bed   in.bed.gz   --out
                    output.txt [OPTIONS]

                    Calculate principal components for a BED/VCF/BCF/CRAM file.

       correct      QTLtools  correct  --vcf  [in.vcf|in.vcf.gz|in.bcf]  |  --bed in.bed.gz --cov
                    covariates.txt | --normal --out output.txt [OPTIONS]

                    Covariate correction of a BED or a VCF file.

       cis          QTLtools     cis     --vcf     [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]      --bed
                    quantifications.bed.gz  [--nominal  float  |  --permute  integer  | --mapping
                    in.txt] --out output.txt [OPTIONS]

                    cis QTL analysis.

       trans        QTLtools    trans     --vcf     [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]     --bed
                    quantifications.bed.gz  [--nominal  | --permute | --sample integer | --adjust
                    in.txt] --out output.txt [OPTIONS]

                    trans QTL analysis.

       fenrich      QTLtools  fenrich  --qtl  significanty_genes.bed  --tss  gene_tss.bed   --bed
                    TFs.encode.bed.gz --out output.txt [OPTIONS]

                    Functional enrichment for QTLs.

       fdensity     QTLtools  fdensity --qtl significanty_genes.bed --bed TFs.encode.bed.gz --out
                    output.txt [OPTIONS]

                    Functional density around QTLs.

       genrich      QTLtools  genrich  --qtl  significanty_genes.bed  --tss  gene_tss.bed   --vcf
                    1000kg.vcf --gwas gwas_hits.bed --out output.txt [OPTIONS]

                    GWAS enrichment for QTLs.  This mode is deprecated and not supported, use rtc
                    instead.

       rtc          QTLtools     rtc     --vcf     [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]      --bed
                    quantifications.bed.gz --hotspots hotspots_b37_hg19.bed [--gwas-cis | --gwas-
                    trans   |   --mergeQTL-cis    |    --mergeQTL-trans]    variants_external.txt
                    qtls_in_this_dataset.txt --out output.txt [OPTIONS]

                    Regulatory  Trait  Concordance  score  analysis  to  test if two colocalizing
                    variants are due to the same functional effect.

       rtc-union    QTLtools  rtc-union  --vcf  [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]  ...    --bed
                    quantifications.bed.gz   ...    --hotspots   hotspots_b37_hg19.bed  --results
                    qtl_results_files.txt ...  [OPTIONS]

                    Find the union of QTLs from independent datasets.  If there was a  QTL  in  a
                    given  recombination  interval in one dataset, then find the best QTL (may or
                    may not be genome-wide significant) in the same recombination interval in all
                    other datasets.

       extract      QTLtools  extract  [--vcf  --bed  --cov]  relevant_file  --out  output_prefix
                    [OPTIONS]

                    Data extraction mode.  Extract all the data from the provided files into  one
                    flat file.

       quan         QTLtools  quan --bam [in.sam|in.bam|in.cram] --gtf gene_annotation.gtf --out-
                    prefix output [OPTIONS]

                    Quantify gene and exon expression from RNAseq.

       ase          QTLtools ase --bam  [in.sam|in.bam|in.cram]  --vcf  [in.vcf|in.vcf.gz|in.bcf]
                    --ind sample_name_in_vcf --mapq integer --out output.txt [OPTIONS]

                    Measure  allele  specific  expression from RNAseq at transcribed heterozygous
                    SNPs

       rep          QTLtools rep  --bed  quantifications.bed.gz  --vcf  [in.vcf|in.vcf.gz|in.bcf]
                    --qtl qtls_external.txt --out output.txt [OPTIONS]

                    Replicate QTL associations in an independent dataset

       gwas         QTLtools     gwas     --vcf     [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]     --bed
                    quantifications.bed.gz --out output.txt [OPTIONS]

                    GWAS tests. Correlate all genotypes with all phenotypes.

GLOBAL OPTIONS

       QTLtools can read gzip, bgzip, and bzip2 files, and can output gzip and bzip2 files.  This
       is dependent on the input and output files' extension.  E.g --out output.txt.gz will write
       a gzipped file.

       The following are common options that are used in all of the modes.  Some  of  these  will
       not apply to certain modes.

       --help Produces a description of options for a given mode.

       --seed integer
              Random   seed  for  analyses  that  utilizes  randomness.   Useful  for  generating
              replicable results.  Default=15112011.

       --log file
              Dump screen output to this file.

       --silent
              Disable screen output.

       --exclude-samples file
              List of samples to exclude.  One sample name per line.

       --include-samples file
              List of samples to include.  One sample name per line.

       --exclude-sites file
              List of variants to exclude.  One variant ID per line.

       --include-sites file
              List of variants to include.  One variant ID per line.

       --exclude-positions file
              List of positions to exclude from genotypes.  One chr position per line  (separated
              by a space).

       --include-positions file
              List  of positions to include from genotypes.  One chr position per line (separated
              by a space).

       --exclude-phenotypes file
              List of phenotypes to exclude.  One phenotype ID per line.

       --include-phenotypes file
              List of phenotypes to include.  One phenotype ID per line.

       --exclude-covariates file
              List of covariates to exclude.  One covariate name per line.

       --include-covariates file
              List of covariates to include.  One covariate name per line.

FILE FORMATS

       .bcf|.vcf|.vcf.gz
              These files are  used  for  genotype  data.   The  official  VCF  specification  is
              described at <https://samtools.github.io/hts-specs/VCFv4.2.pdf>.  The VCF/BCF files
              used with QTLtools must satisfy  this  spec's  requirements.   BCF  files  must  be
              indexed             with             bcftools             index              in.bcf
              <http://samtools.github.io/bcftools/bcftools.html>.  VCF files should be compressed
              by  bgzip  <http://www.htslib.org/doc/bgzip.html>  and  indexed  with  tabix -p vcf
              in.vcf.gz <http://www.htslib.org/doc/tabix.html>.

       .bed|.bed.gz
              These files are used for phenotype data, and in certain modes they can also be used
              with  the  --vcf  option,  which can be used to correlate two molecular phenotypes.
              The   format   used   for    QTLtools    is    a    custom    UCSC    BED    format
              <https://genome.ucsc.edu/FAQ/FAQformat.html#format1>,   which   has   6  annotation
              columns followed by sample columns.  The header line must  exist,  and  must  begin
              with  a  #  and columns must be tab separated. THIS IS A DIFFERENT FILE FORMAT THAN
              THE ONE USED FOR FASTQTL, THUS FASTQTL BED FILES ARE  INCOMPATIBLE  WITH  QTLTOOLS.
              Phenotype       BED      files      must      be      compressed      by      bgzip
              <http://www.htslib.org/doc/bgzip.html> and indexed with  tabix  -p  bed   in.bed.gz
              <http://www.htslib.org/doc/tabix.html>.   Missing  values  must  be  coded  as  NA.
              Following is an example BED file:

              #chr start     end  pid  gid  strand    sample1   sample2
              1    9999 10000     exon1     gene1     +    15   234
              1    9999 10000     exon2     gene1     +    11   134
              1    19999     20000     exon1     gene2     -    154  284
              1    19999     20000     exon2     gene2     -    112  301

              BED file's annotation columns' descriptions:

              1   Phenotype chromosome [string]
              2   Start position of the phenotype [integer, 0-based]
              3   End position of the phenotype [integer, 1-based]
              4   Phenotype ID [string]
              5   Phenotype group ID or any type of info about the phenotype [string]
              6   Phenotype strand [+/-]

       .bam|.sam|.cram
              These files are  used  for  sequence  data.   The  official  SAM  specification  is
              described  at  <https://samtools.github.io/hts-specs/SAMv1.pdf>.   The SAM/BAM/CRAM
              files used with QTLtools must satisfy this spec's requirements.  SAM/BAM/CRAM files
              must        be        indexed        with        samtools       index        in.bam
              <http://www.htslib.org/doc/samtools.html>.

       .gtf   These files are used for gene annotation.  The file specification is  described  at
              <https://www.ensembl.org/info/website/upload/gff.html>.   The  GTF  files used must
              comply with this spec, and  should  have  the  gene_id,  transcript_id,  gene_name,
              gene_type,  and  trnascript_type  attributes.   We recommend using gene annotations
              from GENCODE <https://www.gencodegenes.org/>.

       covariate files
              The covariate file contains the covariate data in simple text format.  The  missing
              values  should  be encoded as NA.  Both quantitative and qualitative covariates are
              supported.  Quantitative covariates  are  assumed  when  only  numeric  values  are
              provided.   Qualitative  covariates  are  assumed  when only non-numeric values are
              provided.  In practice, qualitative covariates with F factors are converted in  F-1
              binary covariates.  Following is an example a covariate file:

              id   sample1   sample2   sample3
              PC1  -0.02     0.14 0.16
              PC2  0.01 0.11 0.10
              PC3  0.03 0.05 0.07
              COV  A    B    C

       include/exclude files
              The   various   --{include,exclude}-{sites,samples,phenotypes,covariates}   options
              require a simple text file which lists the IDs of the  desired  type,  one  ID  per
              line.   The include options will result in running the analyses only in this subset
              of IDs, whereas exclude options will remove these IDs from the analyses.   The  IDs
              for   --{include,exclude}-sites   refer   to  the  3rd  column  in  VCF/BCF  files,
              --{include,exclude}-covariates  refer   to   the   1st   column   in   COV   files,
              --{include,exclude}-phenotyps  refer to the 4th column in BED files and when --grp-
              best option is used to the 5th  column.   The  --include-positions  and  --exclude-
              positions  options  require  a  text file which lists the chromosomes and positions
              (separated by a space) of genotypes to be excluded or included.  One  position  per
              line.

IMPORTANT NOTES

       o BED files' start position is 0-based, whereas the end position is 1-based.  Positions in
         all other files used  in  QTLtools  are  1-based.   All  positions  provided  as  option
         arguments  and  filters, even the ones referring to BED files, must be 1-based.  1-based
         means the first base of the sequence has the position 1, whereas in  0-based  the  first
         position is 0.

       o Make  sure  the chromosome names are the same across all files.  If some files have e.g.
         chr1 and another has 1 as a chromosome name then  these  will  be  considered  different
         chromosomes.

       o BED files used for FastQTL <http://fastqtl.sourceforge.net/> are not directly compatible
         with QTLtools.  To convert a FastQTL BED file to the format used in QTLtools you need to
         add 2 columns after the 4th column.

       o The  quan  mode  in  version  1.2  and  above is not compatible with the quantifications
         generated by the previous versions.  This due to bug fixes and slight adjustments to the
         way  we  quantify.  Do not mix quantifications generated by earlier versions of QTLtools
         with quantifications from version 1.2 and above, as this will  create  a  bias  in  your
         dataset.

       o Make sure you index all your genotype, phenotype, and sequence files.

       o Use BCF and BAM files for the best performance.

EXAMPLE FILES

       exons.50percent.chr22.bed.gz  <http://jungle.unige.ch/QTLtools_examples/exons.50percent.chr22.bed.gz>
       exons.50percent.chr22.bed.gz.tbi   <http://jungle.unige.ch/QTLtools_examples/exons.50percent.chr22.bed.gz.tbi>
       gencode.v19.annotation.chr22.gtf.gz     <http://jungle.unige.ch/QTLtools_examples/gencode.v19.annotation.chr22.gtf.gz>
       gencode.v19.exon.chr22.bed.gz <http://jungle.unige.ch/QTLtools_examples/gencode.v19.exon.chr22.bed.gz>
       genes.50percent.chr22.bed.gz  <http://jungle.unige.ch/QTLtools_examples/genes.50percent.chr22.bed.gz>
       genes.50percent.chr22.bed.gz.tbi   <http://jungle.unige.ch/QTLtools_examples/genes.50percent.chr22.bed.gz.tbi>
       genes.covariates.pc50.txt.gz  <http://jungle.unige.ch/QTLtools_examples/genes.covariates.pc50.txt.gz>
       genes.simulated.chr22.bed.gz  <http://jungle.unige.ch/QTLtools_examples/genes.simulated.chr22.bed.gz>
       genes.simulated.chr22.bed.gz.tbi   <http://jungle.unige.ch/QTLtools_examples/genes.simulated.chr22.bed.gz.tbi>
       genotypes.chr22.vcf.gz   <http://jungle.unige.ch/QTLtools_examples/genotypes.chr22.vcf.gz>
       genotypes.chr22.vcf.gz.tbi    <http://jungle.unige.ch/QTLtools_examples/genotypes.chr22.vcf.gz.tbi>
       GWAS.b37.txt   <http://jungle.unige.ch/QTLtools_examples/GWAS.b37.txt>
       HG00381.chr22.bam   <http://jungle.unige.ch/QTLtools_examples/HG00381.chr22.bam>
       HG00381.chr22.bam.bai    <http://jungle.unige.ch/QTLtools_examples/HG00381.chr22.bam.bai>
       hotspots_b37_hg19.bed    <http://jungle.unige.ch/QTLtools_examples/hotspots_b37_hg19.bed>
       results.genes.full.txt.gz     <http://jungle.unige.ch/QTLtools_examples/results.genes.full.txt.gz>
       TFs.encode.bed.gz   <http://jungle.unige.ch/QTLtools_examples/TFs.encode.bed.gz>

SEE ALSO

       QTLtools-bamstat(1),  QTLtools-mbv(1),  QTLtools-pca(1),  QTLtools-correct(1),   QTLtools-
       cis(1),  QTLtools-trans(1),  QTLtools-fenrich(1),  QTLtools-fdensity(1),  QTLtools-rtc(1),
       QTLtools-rtc-union(1), QTLtools-extract(1), QTLtools-quan(1),  QTLtools-ase(1),  QTLtools-
       rep(1), QTLtools-gwas(1)

       QTLtools website: <https://qtltools.github.io/qtltools>

BUGS

       o Versions  up  to  and  including  1.2, suffer from a bug in reading missing genotypes in
         VCF/BCF files.  This bug affects variants with a DS field in their genotype's FORMAT and
         have  a  missing genotype (DS field is .) in one of the samples, in which case genotypes
         for all the samples are set to missing,  effectively  removing  this  variant  from  the
         analyses.  Affected modes: cis, correct, gwas, pca, rep, trans, rtc-union

       Please submit bugs to <https://github.com/qtltools/qtltools>

CITATIONS

       Delaneau O., Ongen H., Brown A. A., et al. A complete tool set for molecular QTL discovery
       and analysis. Nat Commun 8, 15452 (2017).  <https://doi.org/10.1038/ncomms15452>

       Ongen H, Brown A. A., Delaneau O., et al. Estimating the causal tissues for complex traits
       and      diseases.      Nat      Genet.     2017;49(12):1676-1683.     doi:10.1038/ng.3981
       <https://doi.org/10.1038/ng.3981>

       Fort A., Panousis N. I., Garieri M., et al. MBV: a method to solve sample mislabeling  and
       detect   technical  bias  in  large  combined  genotype  and  sequencing  assay  datasets,
       Bioinformatics 33(12), 1895 2017.  <https://doi.org/10.1093/bioinformatics/btx074>

AUTHORS

       Olivier Delaneau (olivier.delaneau@gmail.com), Halit Ongen (halitongen@gmail.com)