Provided by: qtltools_1.3.1+dfsg-2build2_amd64 bug

NAME

       QTLtools rtc - Regulatory Trait Concordance score analysis

SYNOPSIS

       QTLtools   rtc   --vcf   [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]   --bed   quantifications.bed.gz  --hotspots
       hotspots_b37_hg19.bed   [--gwas-cis    |    --gwas-trans    |    --mergeQTL-cis    |    --mergeQTL-trans]
       variants_external.txt qtls_in_this_dataset.txt --out output.txt [OPTIONS]

DESCRIPTION

       The  RTC  algorithm  assesses  the likelihood of a shared functional effect between a GWAS variant and an
       molQTL by quantifying the change in the statistical significance  of  the  molQTL  after  correcting  the
       molQTL  phenotype  for the genetic effect of the GWAS variant and comparing its correction impact to that
       of all other SNPs in the interval.  The method is detailed in  <https://www.nature.com/articles/ng.3981>.
       When  assessing  tissue  specificity  of  molQTLs  we  use the same method, however in that case the GWAS
       variant becomes an molQTL in a different tissue.  The RTC method  is  as  follows:  for  a  GWAS  variant
       falling  into  the same region flanked by recombination hotspots (coldspot) with an molQTL, with N number
       of variants in a given coldspot:

       1      Correct the phenotype for each of the variants in the  region  separately  by  linear  regression,
              resulting in N number of pseudo-phenotypes (residuals).

       2      Redo the molQTL variant association with all of these pseudo-phenotypes.

       3      Sort  (decreasing) the resulting p-values and find the rank of the molQTL to GWAS-pseudo-phenotype
              among all molQTL to pseudo-phenotype associations.

       4      RTC = (N - Rank of GWAS) / N

       This results in the RTC score which ranges from 0 to 1 where higher values indicate a more likely  shared
       functional effect between the GWAS and the molQTL variants.  An RTC score greater than or equal to 0.9 is
       considered a shared functional effect.  If there are multiple independent molQTLs for a given  phenotype,
       RTC  for  each  independent  molQTL  is assessed after correcting the phenotype with all the other molQTL
       variants for that phenotype.  This correction is done using linear regression and  taking  the  residuals
       after regressing the phenotype with the other molQTLs.

       In  order  to  convert  RTC  score  into a probability of sharing, we employ two simulations per coldspot
       region, H0 and H1.  The H0 scenario is when two variants in a coldspot are tagging  different  functional
       effects.   For  a  coldspot that harbours colocalized GWAS and molQTL variants, we pick two random hidden
       causal variants.  We then find two variants (GWAS and molQTL) that are linked (default r-squared  ≥  0.5)
       to  the  hidden  causal  variants.   We  generate  a  pseudo  phenotype for molQTL based on the slope and
       intercept of the observed molQTL and randomly distributed residuals of the observed molQTL.  Subsequently
       we rerun the RTC analysis with this new pseudo-phenotype and using the GWAS and molQTL variants.

       The  H1  scenario  is  when the two variants are tagging the same functional variant.  The scheme here is
       exactly the same as the H0 scheme, except there is only one hidden  causal  variant  and  both  GWAS  and
       molQTL variants are randomly selected from variants that are linked to the same hidden causal variant.

OPTIONS

       --vcf [in.vcf|in.bcf|in.vcf.gz|in.bed.gz]
              Genotypes in VCF/BCF format, or another molecular phenotype in BED format.  If there is a DS field
              in  the  genotype  FORMAT  of  a  variant  (dosage  of  the  genotype  calculated  from   genotype
              probabilities, e.g. after imputation), then this is used as the genotype.  If there is only the GT
              field in the genotype FORMAT then this is used and it is converted to a dosage.  REQUIRED.

       --bed quantifications.bed.gz
              Molecular phenotype quantifications in BED format.  REQUIRED.

       --out output.txt
              Output file.  REQUIRED.

       --hotspots recombination_hotspots.bed
              Recombination hotspots in BED format.  REQUIRED.

       --cov covariates.txt
              Covariates to correct the phenotype data with.

       --stats-vcf [in.vcf|in.bcf|in.vcf.gz]
              Calculate D' and r-squared from this file.  Defaults to the --vcf  file.   Needs  to  have  phased
              genotypes for D' calculations.

       --stats-vcf-include-samples samples.txt
              Samples to include from the --stats-vcf file.  One sample ID per line.

       --stats-vcf-exclude-samples samples.txt
              Samples to exclude from the --stats-vcf file.  One sample ID per line.

       --normal
              Rank  normal  transform  the  phenotype  data  so  that  each  phenotype  is normally distributed.
              RECOMMENDED.

       --conditional
              molQTLs contain independent signals so execute the conditional analysis.

       --debug
              Print out debugging info to stderr.  DON'T USE.

       --warnings
              Print all encountered individual warnings to stdout.

       --header
              Add a header to the output file when --chunk or --region is provided.

       --individual-Dprime
              Calculate D' on an individual variant basis.  If not provided D'  will  not  be  calculated  after
              first unphased genotype is encountered.

       --mem-est
              Estimate memory usage and exit.

       --mem [0|1|2|3]
              Keep  results  of  calculations that may be used multiple times in memory. 0 = nothing in mem, 1 =
              only basic, 2 = all in mem but clean after unlikely to be reused, 3 =  all  in  mem  no  cleaning.
              DEFAULT=0.  RECOMMENDED=2.

       --window integer
              Size   of   the   cis   window   flanking   each  phenotype's  start  position.   DEFAULT=1000000.
              RECOMMENDED=1000000.

       --sample integer
              Number of simulated RTC values to try to achieve for  each  coldspot,  for  converting  RTC  to  a
              probability.   At  each iteration we try to pick a unique combination of variants, thus the actual
              number of sample iterations may be less than this value, due to the number variants in  a  region.
              If you want to run this analysis, please provide at least 100.  DEFAULT=0.

       --max-sample integer
              Max  number  of  sample  iterations  trying to reach --sample before quitting.  Provide the actual
              number not the multiplier.  DEFAULT=--sample * 50.

       --R2-threshold float
              The minimum r-squared required when picking  a  variant  that  is  linked  to  the  hidden  causal
              variant(s) when running simulations using --sample.  DEFAULT=0.5.

       --D-prime-threshold float
              If  the  pairs  of variants fall into different coldspots and have a D' greater than this, the RTC
              calculation is extended to multiple coldspot regions including both variants.  Assumes D'  can  be
              calculated.  DEFAULT=OFF.  NOT RECOMMENDED.

       --grp-best
              Correct for multiple phenotypes within a phenotype group.

       --pheno-col integer
              1-based phenotype id column number.  DEFAULT=1 or 5 when --grp-best

       --geno-col integer
              1-based genotype id column number.  DEFAULT=8 or 10 when --grp-best

       --grp-col integer
              1-based phenotype group id column number.  Only relevant if --grp-best is in effect.  DEFAULT=1

       --rank-col integer
              1-based  conditional  analysis  rank  column number.  Only relevant if --conditional is in effect.
              DEFAULT=12 or 14 when --grp-best

       --best-col integer
              1-based phenotype column number Only relevant if --conditional is in  effect.   DEFAULT=21  or  23
              when --grp-best

       --gwas-cis variants_external.txt qtls_in_this_dataset.txt
              Run  RTC for GWAS and cis-molQTL colocalization analysis.  Takes two file names as arguments.  The
              first is the file with GWAS variants of interest with one variant ID per line.  These should match
              the  variants  IDs  in the --vcf file.  The second is the QTLtools output for the cis run that was
              ran using the same --vcf, --bed, and --cov files.  REQUIRED unless (and mutually  exclusive  with)
              --gwas-trans, --mergeQTL-cis, --mergeQTL-trans.

       --gwas-trans variants_external.txt qtls_in_this_dataset.txt
              Run  RTC  for  GWAS  and trans-molQTL colocalization analysis.  Takes two file names as arguments.
              The first is the file with GWAS variants of interest with one variant ID per line.   These  should
              match  the  variants  IDs  in the --vcf file.  The second is the QTLtools output for the trans run
              that was ran using the same --vcf, --bed, and --cov files.  You will need to adjust *-col options.
              REQUIRED unless (and mutually exclusive with) --gwas-cis, --mergeQTL-cis, --mergeQTL-trans.

       --mergeQTL-cis variants_external.txt qtls_in_this_dataset.txt
              Run RTC for cis-molQTL and cis-molQTL colocalization analysis.  Takes two file names as arguments.
              The first is the file with cis-molQTL variants of interest discovered in a different dataset, e.g.
              different  tissue, with one variant ID per line.  These should match the variants IDs in the --vcf
              file.  The second is the QTLtools output for the cis run that was ran using the same --vcf, --bed,
              and  --cov  files.   REQUIRED  unless  (and mutually exclusive with) --gwas-trans, --mergeQTL-cis,
              --mergeQTL-trans.

       --mergeQTL-trans variants_external.txt qtls_in_this_dataset.txt
              Run RTC for trans-molQTL and trans-molQTL  colocalization  analysis.   Takes  two  file  names  as
              arguments.  The first is the file with trans-molQTL variants of interest discovered in a different
              dataset, e.g. different tissue, with one variant ID per line.  These should match the variants IDs
              in  the  --vcf  file.   The second is the QTLtools output for the trans run that was ran using the
              same --vcf, --bed, and --cov files.  You will need to adjust *-col options.  REQUIRED unless  (and
              mutually exclusive with) --gwas-trans, --gwas-cis, --mergeQTL-cis.

       --chunk integer1 integer2
              For  parallelization.   Divide  the  data  into integer2 number of chunks and process chunk number
              integer1.  Chunk 0 will print a header.  Mutually exclusive  with  --region.   Minimum  number  of
              chunks has to be at least the same number of chromosomes in the --bed file.

       --region chr:start-end
              Genomic  region  to  be processed.  E.g. chr4:12334456-16334456, or chr5.  Mutually exclusive with
              --chunk.

OUTPUT FILE

       --out output file
        Space separated output file with the following columns.  Columns after the  22nd  are  only  printed  if
        --sample is provided.  We recommend including chunk 0 to print out a header in order to avoid confusion.

         1   other_variant                              The  variant  ID that is external to this dataset, could
                                                        be the GWAS variant or another molQTL
         2   our_variant                                The molQTL variant ID that is internal to this dataset

         3   phenotype                                  The phenotype ID
         4   phenotype_group                            The phenotype group ID
         5   other_variant_chr                          The external variant's chromosome
         6   other_variant_start                        The external variant's start position
         7   other_variant_rank                         Rank of the external  variant.   Only  relevant  if  the
                                                        external variants are part of an conditional analysis
         8   our_variant_chr                            The internal variant's chromosome
         9   our_variant_start                          The internal variant's start position
        10   our_variant_rank                           Rank  of  the  internal  variant.   Only relevant if the
                                                        internal variants are part of an conditional analysis
        11   phenotype_chr                              The phenotype's chromosome
        12   phenotype_start                            The start position of the phenotype
        13   distance_between_variants                  The distance between the two variants
        14   distance_between_other_variant_and_pheno   The  distance  between  the  external  variant  and  the
                                                        phenotype
        15   other_variant_region_index                 The region index of the external variant
        16   our_variant_region_index                   The region index of the internal variant
        17   region_start                               The start position of the region
        18   region_end                                 The end position of the region
        19   variant_count_in_region                    The number of variants in the region
        20   RTC                                        The RTC score
        21   D'                                         The  D'  of  the two variants.  Only calculated if there
                                                        are phased genotypes
        22   r^2                                        The r squared of the two variants
        22   p_value                                    The p-value of the RTC score
        23   unique_picks_H0                            The number of unique combinations of variants in the  H0
                                                        simulations
        24   unique_picks_H1                            The  number of unique combinations of variants in the H1
                                                        simulations
        25   rtc_bin_start                              Lower bound of the RTC bin, based on  the  observed  RTC
                                                        score
        26   rtc_bin_end                                Upper  bound  of  the RTC bin, based on the observed RTC
                                                        score
        27   rtc_bin_H0_proportion                      The proportion of H0 simulated values that  are  between
                                                        rtc_bin_start and rtc_bin_end
        28   rtc_bin_H1_proportion                      The  proportion  of H1 simulated values that are between
                                                        rtc_bin_start and rtc_bin_end
        29   median_r^2                                 The median r-squared in the region
        30   median_H0                                  The median RTC score in the H0 simulations
        31   median_H1                                  The median RTC score in the H1 simulations
        32   H0                                         The RTC scores observed in the H0 simulations
        33   H1                                         The RTC scores observed in the H1 simulations

EXAMPLES

       o Run RTC with  GWAS  variants  and  cis-eQTLs  correcting  for  technical  covariates  and  rank  normal
         transforming the phenotype:

         QTLtools     rtc     --vcf     genotypes.chr22.vcf.gz    --bed    genes.50percent.chr22.bed.gz    --cov
         genes.covariates.pc50.txt.gz     --hotspot      hotspots_b37_hg19.bed      --gwas-cis      GWAS.b37.txt
         permutations_all.significant.txt --normal --out rtc_results.txt

       o RTC  with GWAS variants and cis-eQTLs and simulations, correcting for technical covariates, rank normal
         transforming the phenotype, and  running  conditional  analysis  while  keeping  data  in  memory.   To
         facilitate  parallelization  on compute cluster, we developed an option to run the analysis into chunks
         of molecular phenotypes.  For instance, to run analysis on chunk 12 when splitting the example data set
         into 20 chunks, run:

         QTLtools     rtc     --vcf     genotypes.chr22.vcf.gz    --bed    genes.50percent.chr22.bed.gz    --cov
         genes.covariates.pc50.txt.gz     --hotspot      hotspots_b37_hg19.bed      --gwas-cis      GWAS.b37.txt
         conditional_all.significant.txt  --normal  --conditional  --mem  2  --chunk  12  20  --sample 200 --out
         rtc_results_12_20.txt

       o If you want to submit the whole analysis with 20 jobs on a compute cluster, just run (qsub needs to  be
         changed to the job submission system used [bsub, psub, etc...]):

         for j in $(seq 0 20); do
             echo   "QTLtools   rtc   --vcf   genotypes.chr22.vcf.gz  --bed  genes.50percent.chr22.bed.gz  --cov
             genes.covariates.pc50.txt.gz    --hotspot     hotspots_b37_hg19.bed     --gwas-cis     GWAS.b37.txt
             conditional_all.significant.txt  --normal  --conditional  --mem  2 --chunk $j 20 --sample 200 --out
             rtc_results_$j_20.txt" | qsub
         done

SEE ALSO

       QTLtools(1)

       QTLtools website: <https://qtltools.github.io/qtltools>

BUGS

       Please submit bugs to <https://github.com/qtltools/qtltools>

CITATION

       Ongen H, Brown AA, Delaneau O, et al. Estimating the causal tissues for complex traits and diseases.  Nat
       Genet. 2017;49(12):1676-1683. doi:10.1038/ng.3981 <https://doi.org/10.1038/ng.3981>

AUTHORS

       Halit Ongen (halitongen@gmail.com), Olivier Delaneau (olivier.delaneau@gmail.com)