Ubuntu Manpage: QTLtools trans - trans QTL analysis

Provided by: qtltools_1.3.1+dfsg-2build2_amd64

NAME

       QTLtools trans - trans QTL analysis

SYNOPSIS

       QTLtools  trans  --vcf  [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]  --bed  quantifications.bed.gz  [--nominal  |
       --permute | --sample integer | --adjust in.txt] --out output.txt [OPTIONS]

DESCRIPTION

This mode maps trans (distal) quantitative trait loci (QTLs) that affect the phenotypes, using linear
regression. The method is detailed in <https://www.nature.com/articles/ncomms15452>. We first regress
out the provided covariates from the phenotype data, followed by running the linear regression between
the phenotype residuals and the genotype. If --normal and --cov are provided at the same time, then the
residuals after the covariate correction are rank normal transformed. It incorporates an efficient
permutation scheme. You can run a nominal pass (--nominal) listing all genotype-phenotype associations
below a certain threshold, a permutation pass (--permute or --sample no_genes_to_sample) to empirically
characterize the null distribution of associations, or adjust the nominal p-values based on permutations
(--adjust).

In the full permutation scheme (--permute) we permute all phenotypes using the same random number
sequence to preserve the correlation structure. By doing so, the only association we actually break in
the data is between the genotype and the phenotype data. Then, we proceed with a standard association
scan identical to the one used in the nominal pass. In practice, we repeat this for 100 permutations of
the phenotype data. Subsequently, we can proceed with FDR correction by ranking all the nominal p-values
in ascending order and by counting how many p-values in the permuted data sets are smaller. This
provides an FDR estimate: if we have 500 p-values in the permuted data sets that are smaller than the
100th smallest nominal p-value, we can then assume that the FDR for the 100 first associations is around
5% (=500/(100 × 100)).

To enable fast screening in trans, we also designed an approximation of the method described just above
based on what we already do in cis. To make it possible, we assume that the phenotypes are independent
and normally distributed (which can be enforced with --normal). The idea is that since all phenotypes
are normally distributed, effectively they are the same, and also the cis region removed from each
phenotype is so small compared to rest of the genome that its phenotype specific impact is negligible.
Hence the number of and the correlation amongst variants for each phenotype is approximately the same,
and each phenotype is approximately the same; thus we can run permutations with a small number of
phenotypes rather then all, which drastically decreases the computational burden and the null
distribution generated can be applied to all phenotypes. The implementation draws from the null by
permuting some randomly chosen phenotypes, testing for associations with all variants in trans and
storing the smallest p-value. When we repeat this many times (typically 1000), effectively building a
null distribution of the strongest associations for a single phenotype. We then make it continuous by
fitting a beta distribution as we do in cis and use it to adjust every nominal p-value coming from the
initial pass for the number of variants being tested. To correct for the number of phenotypes being
tested, we estimate FDR as we do in cis; that is from the best adjusted p-values per phenotype (one per
phenotype). This also gives an adjusted p-value threshold that we use to identify all phenotype-variant
pairs that are whole-genome significant. In our experiments, this approach gives similar results to the
full permutation scheme both in term of FDR estimates and number of discoveries, while running faster.

Since linear regressions assumes normally distributed data, we highly recommend using the --normal option
to rank normal transform the phenotype quantifications in order to avoid false positive associations due
to outliers. If you are using the approximate permutation scheme (--sample) you MUST use the --normal
option or make sure that your phenotypes are normally distributed.

OPTIONS

--vcf [in.vcf|in.bcf|in.vcf.gz|in.bed.gz]
Genotypes in VCF/BCF format, or another molecular phenotype in BED format. If there is a DS field
in the genotype FORMAT of a variant (dosage of the genotype calculated from genotype
probabilities, e.g. after imputation), then this is used as the genotype. If there is only the GT
field in the genotype FORMAT then this is used and it is converted to a dosage. REQUIRED.

--bed quantifications.bed.gz
Molecular phenotype quantifications in BED format. REQUIRED.

--out output.txt
Output file. REQUIRED.

--cov covariates.txt
Covariates to correct the phenotype data with.

--normal
Rank normal transform the phenotype data so that each phenotype is normally distributed.
RECOMMENDED.

--window integer
Size of the cis window to remove flanking each phenotype's start position. DEFAULT=5000000.

--threshold float
P-value threshold below which hits are reported. Give 1.0 to print everything, which may generate
a huge file. When --adjust is provided, this threshold applies to the adjusted p-values.
DEFAULT=1e-5.

--bins integer
Number of bins to use to categorize all p-values above --threshold. DEFAULT=1000.

--nominal
Calculate the nominal p-value for the genotype-phenotype associations and print out the ones that
pass the provided threshold. Mutually exclusive with --permute, --sample and --adjust.

--permute
Permute all phenotypes together, once. For multiple permutations you need to change the random
seed using --seed for each permutation. Mutually exclusive with --nominal, --sample and --adjust.

--sample integer
Permute randomly chosen phenotypes integer times. Mutually exclusive with --nominal, --permute,
--adjust, and --chunk.

--adjust filename
Test and adjust p-values using the null distribution in filename. Mutually exclusive with
--nominal, --permute, and --sample.

--chunk integer1 integer2
For parallelization. Divide the data into integer2 number of chunks and process chunk number
integer1. Minimum number of chunks has to be at least the same number of chromosomes in the --bed
file.

OUTPUT FILES

       .hits.txt.gz
        Space separated results output file detailing the variant-phenotype pairs that pass the  threshold  with
        the following columns:

        1   The phenotype ID
        2   The phenotype chromosome
        3   Start position of the phenotype
        4   The variant ID
        5   The variant chromosome
        6   The start position of the variant
        7   The nominal p-value of the association between the variant and the phenotype.
        8   The adjusted p-value of the association between the variant and the phenotype.  Requires --adjust
        9   Correlation coefficient

       .best.txt.gz
        Space separated output file listing the most significant variant per phenotype.

        1   The phenotype ID
        2   The adjusted p-value of the association between the variant and the phenotype.  Requires --adjust
        3   The nominal p-value of the association between the variant and the phenotype.

        4   The variant ID

       .bins.txt.gz
        Space  separated  output  file  containing  the  binning  of all hits with a p-value below the specified
        --threshold.

        1   The index of the bin
        2   The lower bound of the correlation coefficient for this bin
        3   The upper bound of the correlation coefficient for this bin
        4   The upper bound of the p-value for this bin
        5   The lower bound of the p-value for this bin

FULL PERMUTATION ANALYSIS EXAMPLE

       1 Run a nominal analysis, rank normal transforming the phenotypes and outputting all associations with  a
         p-value below 1e-5:

         QTLtools trans --vcf genotypes.chr22.vcf.gz --bed genes.simulated.chr22.bed.gz --nominal --normal --out
         trans.nominal

       2 Run a full permutation analysis with 100 jobs on a compute cluster, run the following making sure  that
         you  change  the  seed  for  each permutation iteration (qsub needs to be changed to the job submission
         system used [bsub, psub, etc...])

         for j in $(seq 1 100); do
             echo "QTLtools trans  --vcf  genotypes.chr22.vcf.gz  --bed  genes.simulated.chr22.bed.gz  --permute
             --normal --out trans.perm$j.txt --seed $j" | qsub
         done

APPROXIMATE PERMUTATION ANALYSIS EXAMPLE

       1 Build  the  null  distribution  randomly  selecting  1000  phenotypes, and rank normal transforming the
         phenotypes:

         QTLtools trans --vcf genotypes.chr22.vcf.gz --bed genes.simulated.chr22.bed.gz --sample  1000  --normal
         --out trans.sample

       2 Run  the nominal pass adjusting the p-values with the given null distribution, rank normal transforming
         the phenotypes, and printing out associations with an adjusted p-value less than 0.1:

         QTLtools   trans   --vcf    genotypes.chr22.vcf.gz    --bed    genes.simulated.chr22.bed.gz    --adjust
         trans.sample.best.txt.gz --threshold 0.1 --normal --out trans.adjust

BUGS

       o Versions  up  to  and  including  1.2, suffer from a bug in reading missing genotypes in VCF/BCF files.
         This bug affects variants with a DS field in their genotype's FORMAT and have a  missing  genotype  (DS
         field  is  .)  in  one  of the samples, in which case genotypes for all the samples are set to missing,
         effectively removing this variant from the analyses.

       Please submit bugs to <https://github.com/qtltools/qtltools>

CITATION

       Delaneau, O., Ongen, H., Brown, A. et al. A complete tool set for molecular QTL discovery  and  analysis.
       Nat Commun 8, 15452 (2017).  <https://doi.org/10.1038/ncomms15452>

AUTHORS

       Halit Ongen (halitongen@gmail.com), Olivier Delaneau (olivier.delaneau@gmail.com)