Provided by: qtltools_1.3.1+dfsg-2build2_amd64 bug

NAME

       QTLtools trans - trans QTL analysis

SYNOPSIS

       QTLtools  trans  --vcf  [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]  --bed  quantifications.bed.gz  [--nominal  |
       --permute | --sample integer | --adjust in.txt] --out output.txt [OPTIONS]

DESCRIPTION

       This mode maps trans (distal) quantitative trait loci (QTLs) that affect  the  phenotypes,  using  linear
       regression.   The  method is detailed in <https://www.nature.com/articles/ncomms15452>.  We first regress
       out the provided covariates from the phenotype data, followed by running the  linear  regression  between
       the  phenotype residuals and the genotype.  If --normal and --cov are provided at the same time, then the
       residuals after the covariate correction are rank  normal  transformed.   It  incorporates  an  efficient
       permutation  scheme.   You can run a nominal pass (--nominal) listing all genotype-phenotype associations
       below a certain threshold, a permutation pass (--permute or --sample no_genes_to_sample)  to  empirically
       characterize  the null distribution of associations, or adjust the nominal p-values based on permutations
       (--adjust).

       In the full permutation scheme (--permute) we  permute  all  phenotypes  using  the  same  random  number
       sequence  to  preserve the correlation structure.  By doing so, the only association we actually break in
       the data is between the genotype and the phenotype data.  Then, we proceed with  a  standard  association
       scan  identical to the one used in the nominal pass.  In practice, we repeat this for 100 permutations of
       the phenotype data.  Subsequently, we can proceed with FDR correction by ranking all the nominal p-values
       in  ascending  order  and  by  counting  how  many  p-values in the permuted data sets are smaller.  This
       provides an FDR estimate: if we have 500 p-values in the permuted data sets that  are  smaller  than  the
       100th  smallest nominal p-value, we can then assume that the FDR for the 100 first associations is around
       5% (=500/(100 × 100)).

       To enable fast screening in trans, we also designed an approximation of the method described  just  above
       based  on  what we already do in cis.  To make it possible, we assume that the phenotypes are independent
       and normally distributed (which can be enforced with --normal).  The idea is that  since  all  phenotypes
       are  normally  distributed,  effectively  they  are  the  same, and also the cis region removed from each
       phenotype is so small compared to rest of the genome that its phenotype specific  impact  is  negligible.
       Hence  the  number  of and the correlation amongst variants for each phenotype is approximately the same,
       and each phenotype is approximately the same; thus we  can  run  permutations  with  a  small  number  of
       phenotypes  rather  then  all,  which  drastically  decreases  the  computational  burden  and  the  null
       distribution generated can be applied to all phenotypes.  The  implementation  draws  from  the  null  by
       permuting  some  randomly  chosen  phenotypes,  testing  for  associations with all variants in trans and
       storing the smallest p-value.  When we repeat this many times (typically 1000),  effectively  building  a
       null  distribution  of  the strongest associations for a single phenotype.  We then make it continuous by
       fitting a beta distribution as we do in cis and use it to adjust every nominal p-value  coming  from  the
       initial  pass  for  the  number  of variants being tested.  To correct for the number of phenotypes being
       tested, we estimate FDR as we do in cis; that is from the best adjusted p-values per phenotype  (one  per
       phenotype).   This also gives an adjusted p-value threshold that we use to identify all phenotype-variant
       pairs that are whole-genome significant.  In our experiments, this approach gives similar results to  the
       full permutation scheme both in term of FDR estimates and number of discoveries, while running faster.

       Since linear regressions assumes normally distributed data, we highly recommend using the --normal option
       to rank normal transform the phenotype quantifications in order to avoid false positive associations  due
       to  outliers.   If  you are using the approximate permutation scheme (--sample) you MUST use the --normal
       option or make sure that your phenotypes are normally distributed.

OPTIONS

       --vcf [in.vcf|in.bcf|in.vcf.gz|in.bed.gz]
              Genotypes in VCF/BCF format, or another molecular phenotype in BED format.  If there is a DS field
              in   the  genotype  FORMAT  of  a  variant  (dosage  of  the  genotype  calculated  from  genotype
              probabilities, e.g. after imputation), then this is used as the genotype.  If there is only the GT
              field in the genotype FORMAT then this is used and it is converted to a dosage.  REQUIRED.

       --bed quantifications.bed.gz
              Molecular phenotype quantifications in BED format.  REQUIRED.

       --out output.txt
              Output file.  REQUIRED.

       --cov covariates.txt
              Covariates to correct the phenotype data with.

       --normal
              Rank  normal  transform  the  phenotype  data  so  that  each  phenotype  is normally distributed.
              RECOMMENDED.

       --window integer
              Size of the cis window to remove flanking each phenotype's start position.  DEFAULT=5000000.

       --threshold float
              P-value threshold below which hits are reported.  Give 1.0 to print everything, which may generate
              a  huge  file.   When  --adjust  is  provided,  this  threshold  applies to the adjusted p-values.
              DEFAULT=1e-5.

       --bins integer
              Number of bins to use to categorize all p-values above --threshold.  DEFAULT=1000.

       --nominal
              Calculate the nominal p-value for the genotype-phenotype associations and print out the ones  that
              pass the provided threshold.  Mutually exclusive with --permute, --sample and --adjust.

       --permute
              Permute  all  phenotypes  together, once.  For multiple permutations you need to change the random
              seed using --seed for each permutation.  Mutually exclusive with --nominal, --sample and --adjust.

       --sample integer
              Permute randomly chosen phenotypes integer times.  Mutually exclusive with  --nominal,  --permute,
              --adjust, and --chunk.

       --adjust filename
              Test  and  adjust  p-values  using  the  null  distribution  in filename.  Mutually exclusive with
              --nominal, --permute, and --sample.

       --chunk integer1 integer2
              For parallelization.  Divide the data into integer2 number of  chunks  and  process  chunk  number
              integer1.  Minimum number of chunks has to be at least the same number of chromosomes in the --bed
              file.

OUTPUT FILES

       .hits.txt.gz
        Space separated results output file detailing the variant-phenotype pairs that pass the  threshold  with
        the following columns:

        1   The phenotype ID
        2   The phenotype chromosome
        3   Start position of the phenotype
        4   The variant ID
        5   The variant chromosome
        6   The start position of the variant
        7   The nominal p-value of the association between the variant and the phenotype.
        8   The adjusted p-value of the association between the variant and the phenotype.  Requires --adjust
        9   Correlation coefficient

       .best.txt.gz
        Space separated output file listing the most significant variant per phenotype.

        1   The phenotype ID
        2   The adjusted p-value of the association between the variant and the phenotype.  Requires --adjust
        3   The nominal p-value of the association between the variant and the phenotype.

        4   The variant ID

       .bins.txt.gz
        Space  separated  output  file  containing  the  binning  of all hits with a p-value below the specified
        --threshold.

        1   The index of the bin
        2   The lower bound of the correlation coefficient for this bin
        3   The upper bound of the correlation coefficient for this bin
        4   The upper bound of the p-value for this bin
        5   The lower bound of the p-value for this bin

FULL PERMUTATION ANALYSIS EXAMPLE

       1 Run a nominal analysis, rank normal transforming the phenotypes and outputting all associations with  a
         p-value below 1e-5:

         QTLtools trans --vcf genotypes.chr22.vcf.gz --bed genes.simulated.chr22.bed.gz --nominal --normal --out
         trans.nominal

       2 Run a full permutation analysis with 100 jobs on a compute cluster, run the following making sure  that
         you  change  the  seed  for  each permutation iteration (qsub needs to be changed to the job submission
         system used [bsub, psub, etc...])

         for j in $(seq 1 100); do
             echo "QTLtools trans  --vcf  genotypes.chr22.vcf.gz  --bed  genes.simulated.chr22.bed.gz  --permute
             --normal --out trans.perm$j.txt --seed $j" | qsub
         done

APPROXIMATE PERMUTATION ANALYSIS EXAMPLE

       1 Build  the  null  distribution  randomly  selecting  1000  phenotypes, and rank normal transforming the
         phenotypes:

         QTLtools trans --vcf genotypes.chr22.vcf.gz --bed genes.simulated.chr22.bed.gz --sample  1000  --normal
         --out trans.sample

       2 Run  the nominal pass adjusting the p-values with the given null distribution, rank normal transforming
         the phenotypes, and printing out associations with an adjusted p-value less than 0.1:

         QTLtools   trans   --vcf    genotypes.chr22.vcf.gz    --bed    genes.simulated.chr22.bed.gz    --adjust
         trans.sample.best.txt.gz --threshold 0.1 --normal --out trans.adjust

SEE ALSO

       QTLtools(1)

       QTLtools website: <https://qtltools.github.io/qtltools>

BUGS

       o Versions  up  to  and  including  1.2, suffer from a bug in reading missing genotypes in VCF/BCF files.
         This bug affects variants with a DS field in their genotype's FORMAT and have a  missing  genotype  (DS
         field  is  .)  in  one  of the samples, in which case genotypes for all the samples are set to missing,
         effectively removing this variant from the analyses.

       Please submit bugs to <https://github.com/qtltools/qtltools>

CITATION

       Delaneau, O., Ongen, H., Brown, A. et al. A complete tool set for molecular QTL discovery  and  analysis.
       Nat Commun 8, 15452 (2017).  <https://doi.org/10.1038/ncomms15452>

AUTHORS

       Halit Ongen (halitongen@gmail.com), Olivier Delaneau (olivier.delaneau@gmail.com)