lunar (1) QTLtools-trans.1.gz

Provided by: qtltools_1.3.1+dfsg-4_amd64 bug

NAME

       QTLtools trans - trans QTL analysis

SYNOPSIS

       QTLtools  trans  --vcf  [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]  --bed  quantifications.bed.gz
       [--nominal | --permute | --sample integer | --adjust in.txt] --out output.txt [OPTIONS]

DESCRIPTION

       This mode maps trans (distal) quantitative trait loci (QTLs) that affect  the  phenotypes,
       using       linear       regression.        The       method      is      detailed      in
       <https://www.nature.com/articles/ncomms15452>.   We  first  regress   out   the   provided
       covariates  from the phenotype data, followed by running the linear regression between the
       phenotype residuals and the genotype.  If --normal and --cov  are  provided  at  the  same
       time,  then  the residuals after the covariate correction are rank normal transformed.  It
       incorporates an efficient permutation scheme.  You can  run  a  nominal  pass  (--nominal)
       listing  all genotype-phenotype associations below a certain threshold, a permutation pass
       (--permute  or  --sample  no_genes_to_sample)  to  empirically   characterize   the   null
       distribution  of  associations,  or  adjust  the  nominal  p-values  based on permutations
       (--adjust).

       In the full permutation scheme (--permute) we permute all phenotypes using the same random
       number  sequence to preserve the correlation structure.  By doing so, the only association
       we actually break in the data is between the genotype and the phenotype  data.   Then,  we
       proceed  with  a  standard association scan identical to the one used in the nominal pass.
       In practice, we repeat this for 100 permutations of the phenotype data.  Subsequently,  we
       can proceed with FDR correction by ranking all the nominal p-values in ascending order and
       by counting how many p-values in the permuted data sets are smaller.  This provides an FDR
       estimate:  if  we  have  500  p-values in the permuted data sets that are smaller than the
       100th smallest nominal p-value, we can  then  assume  that  the  FDR  for  the  100  first
       associations is around 5% (=500/(100 × 100)).

       To  enable  fast  screening  in  trans,  we  also  designed an approximation of the method
       described just above based on what we already do in cis.  To make it possible,  we  assume
       that  the  phenotypes are independent and normally distributed (which can be enforced with
       --normal).  The idea is that since all phenotypes are  normally  distributed,  effectively
       they  are  the  same,  and  also  the  cis  region removed from each phenotype is so small
       compared to rest of the genome that its phenotype specific impact  is  negligible.   Hence
       the number of and the correlation amongst variants for each phenotype is approximately the
       same, and each phenotype is approximately the same; thus we can run  permutations  with  a
       small  number of phenotypes rather then all, which drastically decreases the computational
       burden and the null  distribution  generated  can  be  applied  to  all  phenotypes.   The
       implementation  draws  from the null by permuting some randomly chosen phenotypes, testing
       for associations with all variants in trans and storing the  smallest  p-value.   When  we
       repeat  this  many times (typically 1000), effectively building a null distribution of the
       strongest associations for a single phenotype.  We then make it continuous  by  fitting  a
       beta  distribution  as we do in cis and use it to adjust every nominal p-value coming from
       the initial pass for the number of variants being tested.  To correct for  the  number  of
       phenotypes  being  tested, we estimate FDR as we do in cis; that is from the best adjusted
       p-values per phenotype (one per phenotype).  This also gives an adjusted p-value threshold
       that we use to identify all phenotype-variant pairs that are whole-genome significant.  In
       our experiments, this approach gives similar results to the full permutation  scheme  both
       in term of FDR estimates and number of discoveries, while running faster.

       Since  linear regressions assumes normally distributed data, we highly recommend using the
       --normal option to rank normal transform the phenotype quantifications in order  to  avoid
       false positive associations due to outliers.  If you are using the approximate permutation
       scheme (--sample) you MUST use the --normal option or make sure that your  phenotypes  are
       normally distributed.

OPTIONS

       --vcf [in.vcf|in.bcf|in.vcf.gz|in.bed.gz]
              Genotypes  in  VCF/BCF  format,  or  another molecular phenotype in BED format.  If
              there is a DS field in the genotype FORMAT of a variant  (dosage  of  the  genotype
              calculated  from  genotype probabilities, e.g. after imputation), then this is used
              as the genotype.  If there is only the GT field in the genotype FORMAT then this is
              used and it is converted to a dosage.  REQUIRED.

       --bed quantifications.bed.gz
              Molecular phenotype quantifications in BED format.  REQUIRED.

       --out output.txt
              Output file.  REQUIRED.

       --cov covariates.txt
              Covariates to correct the phenotype data with.

       --normal
              Rank  normal  transform  the  phenotype  data  so  that  each phenotype is normally
              distributed.  RECOMMENDED.

       --window integer
              Size of the  cis  window  to  remove  flanking  each  phenotype's  start  position.
              DEFAULT=5000000.

       --threshold float
              P-value  threshold  below  which  hits are reported.  Give 1.0 to print everything,
              which may generate a huge file.  When --adjust is provided, this threshold  applies
              to the adjusted p-values.  DEFAULT=1e-5.

       --bins integer
              Number of bins to use to categorize all p-values above --threshold.  DEFAULT=1000.

       --nominal
              Calculate the nominal p-value for the genotype-phenotype associations and print out
              the ones that pass the provided  threshold.   Mutually  exclusive  with  --permute,
              --sample and --adjust.

       --permute
              Permute  all  phenotypes  together,  once.   For  multiple permutations you need to
              change the random seed using --seed for each permutation.  Mutually exclusive  with
              --nominal, --sample and --adjust.

       --sample integer
              Permute   randomly  chosen  phenotypes  integer  times.   Mutually  exclusive  with
              --nominal, --permute, --adjust, and --chunk.

       --adjust filename
              Test and adjust  p-values  using  the  null  distribution  in  filename.   Mutually
              exclusive with --nominal, --permute, and --sample.

       --chunk integer1 integer2
              For  parallelization.   Divide  the data into integer2 number of chunks and process
              chunk number integer1.  Minimum number of chunks has to be at least the same number
              of chromosomes in the --bed file.

OUTPUT FILES

       .hits.txt.gz
        Space  separated  results output file detailing the variant-phenotype pairs that pass the
        threshold with the following columns:

        1   The phenotype ID
        2   The phenotype chromosome
        3   Start position of the phenotype
        4   The variant ID
        5   The variant chromosome
        6   The start position of the variant
        7   The nominal p-value of the association between the variant and the phenotype.

        8   The adjusted p-value of the  association  between  the  variant  and  the  phenotype.
            Requires --adjust
        9   Correlation coefficient

       .best.txt.gz
        Space separated output file listing the most significant variant per phenotype.

        1   The phenotype ID
        2   The  adjusted  p-value  of  the  association  between  the variant and the phenotype.
            Requires --adjust
        3   The nominal p-value of the association between the variant and the phenotype.
        4   The variant ID

       .bins.txt.gz
        Space separated output file containing the binning of all hits with a p-value  below  the
        specified --threshold.

        1   The index of the bin
        2   The lower bound of the correlation coefficient for this bin
        3   The upper bound of the correlation coefficient for this bin
        4   The upper bound of the p-value for this bin
        5   The lower bound of the p-value for this bin

FULL PERMUTATION ANALYSIS EXAMPLE

       1 Run  a  nominal  analysis,  rank  normal  transforming the phenotypes and outputting all
         associations with a p-value below 1e-5:

         QTLtools trans --vcf genotypes.chr22.vcf.gz --bed genes.simulated.chr22.bed.gz --nominal
         --normal --out trans.nominal

       2 Run  a  full  permutation analysis with 100 jobs on a compute cluster, run the following
         making sure that you change the seed for each permutation iteration (qsub  needs  to  be
         changed to the job submission system used [bsub, psub, etc...])

         for j in $(seq 1 100); do
             echo "QTLtools trans --vcf genotypes.chr22.vcf.gz --bed genes.simulated.chr22.bed.gz
             --permute --normal --out trans.perm$j.txt --seed $j" | qsub
         done

APPROXIMATE PERMUTATION ANALYSIS EXAMPLE

       1 Build the  null  distribution  randomly  selecting  1000  phenotypes,  and  rank  normal
         transforming the phenotypes:

         QTLtools  trans --vcf genotypes.chr22.vcf.gz --bed genes.simulated.chr22.bed.gz --sample
         1000 --normal --out trans.sample

       2 Run the nominal pass adjusting the p-values  with  the  given  null  distribution,  rank
         normal  transforming  the  phenotypes, and printing out associations with an adjusted p-
         value less than 0.1:

         QTLtools trans --vcf genotypes.chr22.vcf.gz --bed genes.simulated.chr22.bed.gz  --adjust
         trans.sample.best.txt.gz --threshold 0.1 --normal --out trans.adjust

SEE ALSO

       QTLtools(1)

       QTLtools website: <https://qtltools.github.io/qtltools>

BUGS

       o Versions  up  to  and  including  1.2, suffer from a bug in reading missing genotypes in
         VCF/BCF files.  This bug affects variants with a DS field in their genotype's FORMAT and
         have  a  missing genotype (DS field is .) in one of the samples, in which case genotypes
         for all the samples are set to missing,  effectively  removing  this  variant  from  the
         analyses.

       Please submit bugs to <https://github.com/qtltools/qtltools>

CITATION

       Delaneau,  O., Ongen, H., Brown, A. et al. A complete tool set for molecular QTL discovery
       and analysis. Nat Commun 8, 15452 (2017).  <https://doi.org/10.1038/ncomms15452>

AUTHORS

       Halit Ongen (halitongen@gmail.com), Olivier Delaneau (olivier.delaneau@gmail.com)