Provided by: qtltools_1.3.1+dfsg-2build2_amd64 bug

NAME

       QTLtools rtc - Regulatory Trait Concordance score analysis

SYNOPSIS

       QTLtools   rtc   --vcf  [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]  --bed  quantifications.bed.gz
       --hotspots hotspots_b37_hg19.bed [--gwas-cis | --gwas-trans | --mergeQTL-cis | --mergeQTL-
       trans] variants_external.txt qtls_in_this_dataset.txt --out output.txt [OPTIONS]

DESCRIPTION

       The  RTC  algorithm  assesses  the likelihood of a shared functional effect between a GWAS
       variant and an molQTL by quantifying the change in the  statistical  significance  of  the
       molQTL  after  correcting  the molQTL phenotype for the genetic effect of the GWAS variant
       and comparing its correction impact to that of all other SNPs in the interval.  The method
       is   detailed   in   <https://www.nature.com/articles/ng.3981>.    When  assessing  tissue
       specificity of molQTLs we use the same method, however  in  that  case  the  GWAS  variant
       becomes an molQTL in a different tissue.  The RTC method is as follows: for a GWAS variant
       falling into the same region flanked by recombination hotspots (coldspot) with an  molQTL,
       with N number of variants in a given coldspot:

       1      Correct  the  phenotype for each of the variants in the region separately by linear
              regression, resulting in N number of pseudo-phenotypes (residuals).

       2      Redo the molQTL variant association with all of these pseudo-phenotypes.

       3      Sort (decreasing) the resulting p-values and find the rank of the molQTL  to  GWAS-
              pseudo-phenotype among all molQTL to pseudo-phenotype associations.

       4      RTC = (N - Rank of GWAS) / N

       This results in the RTC score which ranges from 0 to 1 where higher values indicate a more
       likely shared functional effect between the GWAS and the molQTL variants.   An  RTC  score
       greater  than  or  equal  to  0.9  is considered a shared functional effect.  If there are
       multiple independent molQTLs for a given phenotype, RTC for  each  independent  molQTL  is
       assessed  after  correcting  the  phenotype  with  all  the other molQTL variants for that
       phenotype.  This correction is done using linear regression and taking the residuals after
       regressing the phenotype with the other molQTLs.

       In order to convert RTC score into a probability of sharing, we employ two simulations per
       coldspot region, H0 and H1.  The H0 scenario is  when  two  variants  in  a  coldspot  are
       tagging  different  functional effects.  For a coldspot that harbours colocalized GWAS and
       molQTL variants, we pick two random hidden causal variants.  We  then  find  two  variants
       (GWAS and molQTL) that are linked (default r-squared ≥ 0.5) to the hidden causal variants.
       We generate a pseudo phenotype for molQTL based on the slope and intercept of the observed
       molQTL  and  randomly distributed residuals of the observed molQTL.  Subsequently we rerun
       the RTC analysis with this new pseudo-phenotype and using the GWAS and molQTL variants.

       The H1 scenario is when the two variants are tagging the  same  functional  variant.   The
       scheme  here  is exactly the same as the H0 scheme, except there is only one hidden causal
       variant and both GWAS and molQTL variants are randomly selected  from  variants  that  are
       linked to the same hidden causal variant.

OPTIONS

       --vcf [in.vcf|in.bcf|in.vcf.gz|in.bed.gz]
              Genotypes  in  VCF/BCF  format,  or  another molecular phenotype in BED format.  If
              there is a DS field in the genotype FORMAT of a variant  (dosage  of  the  genotype
              calculated  from  genotype probabilities, e.g. after imputation), then this is used
              as the genotype.  If there is only the GT field in the genotype FORMAT then this is
              used and it is converted to a dosage.  REQUIRED.

       --bed quantifications.bed.gz
              Molecular phenotype quantifications in BED format.  REQUIRED.

       --out output.txt
              Output file.  REQUIRED.

       --hotspots recombination_hotspots.bed
              Recombination hotspots in BED format.  REQUIRED.

       --cov covariates.txt
              Covariates to correct the phenotype data with.

       --stats-vcf [in.vcf|in.bcf|in.vcf.gz]
              Calculate  D'  and r-squared from this file.  Defaults to the --vcf file.  Needs to
              have phased genotypes for D' calculations.

       --stats-vcf-include-samples samples.txt
              Samples to include from the --stats-vcf file.  One sample ID per line.

       --stats-vcf-exclude-samples samples.txt
              Samples to exclude from the --stats-vcf file.  One sample ID per line.

       --normal
              Rank normal transform the  phenotype  data  so  that  each  phenotype  is  normally
              distributed.  RECOMMENDED.

       --conditional
              molQTLs contain independent signals so execute the conditional analysis.

       --debug
              Print out debugging info to stderr.  DON'T USE.

       --warnings
              Print all encountered individual warnings to stdout.

       --header
              Add a header to the output file when --chunk or --region is provided.

       --individual-Dprime
              Calculate  D'  on  an  individual  variant  basis.   If not provided D' will not be
              calculated after first unphased genotype is encountered.

       --mem-est
              Estimate memory usage and exit.

       --mem [0|1|2|3]
              Keep results of calculations that may be used multiple times in memory. 0 = nothing
              in  mem,  1 = only basic, 2 = all in mem but clean after unlikely to be reused, 3 =
              all in mem no cleaning.  DEFAULT=0.  RECOMMENDED=2.

       --window integer
              Size of the cis window flanking each phenotype's start position.   DEFAULT=1000000.
              RECOMMENDED=1000000.

       --sample integer
              Number  of simulated RTC values to try to achieve for each coldspot, for converting
              RTC to a probability.  At each iteration we try to pick  a  unique  combination  of
              variants,  thus the actual number of sample iterations may be less than this value,
              due to the number variants in a region.  If you want to run this  analysis,  please
              provide at least 100.  DEFAULT=0.

       --max-sample integer
              Max  number of sample iterations trying to reach --sample before quitting.  Provide
              the actual number not the multiplier.  DEFAULT=--sample * 50.

       --R2-threshold float
              The minimum r-squared required when picking a variant that is linked to the  hidden
              causal variant(s) when running simulations using --sample.  DEFAULT=0.5.

       --D-prime-threshold float
              If  the  pairs of variants fall into different coldspots and have a D' greater than
              this, the RTC calculation is extended to multiple coldspot regions  including  both
              variants.  Assumes D' can be calculated.  DEFAULT=OFF.  NOT RECOMMENDED.

       --grp-best
              Correct for multiple phenotypes within a phenotype group.

       --pheno-col integer
              1-based phenotype id column number.  DEFAULT=1 or 5 when --grp-best

       --geno-col integer
              1-based genotype id column number.  DEFAULT=8 or 10 when --grp-best

       --grp-col integer
              1-based  phenotype  group  id  column  number.   Only  relevant if --grp-best is in
              effect.  DEFAULT=1

       --rank-col integer
              1-based conditional analysis rank column number.  Only relevant if --conditional is
              in effect.  DEFAULT=12 or 14 when --grp-best

       --best-col integer
              1-based  phenotype  column  number  Only  relevant  if  --conditional is in effect.
              DEFAULT=21 or 23 when --grp-best

       --gwas-cis variants_external.txt qtls_in_this_dataset.txt
              Run RTC for GWAS and cis-molQTL colocalization analysis.  Takes two file  names  as
              arguments.   The  first is the file with GWAS variants of interest with one variant
              ID per line.  These should match the variants IDs in the --vcf file.  The second is
              the  QTLtools  output for the cis run that was ran using the same --vcf, --bed, and
              --cov  files.   REQUIRED  unless  (and  mutually  exclusive   with)   --gwas-trans,
              --mergeQTL-cis, --mergeQTL-trans.

       --gwas-trans variants_external.txt qtls_in_this_dataset.txt
              Run RTC for GWAS and trans-molQTL colocalization analysis.  Takes two file names as
              arguments.  The first is the file with GWAS variants of interest with  one  variant
              ID per line.  These should match the variants IDs in the --vcf file.  The second is
              the QTLtools output for the trans run that was ran using the same --vcf, --bed, and
              --cov files.  You will need to adjust *-col options.  REQUIRED unless (and mutually
              exclusive with) --gwas-cis, --mergeQTL-cis, --mergeQTL-trans.

       --mergeQTL-cis variants_external.txt qtls_in_this_dataset.txt
              Run RTC for cis-molQTL and cis-molQTL  colocalization  analysis.   Takes  two  file
              names  as  arguments.   The  first is the file with cis-molQTL variants of interest
              discovered in a different dataset, e.g. different tissue, with one variant  ID  per
              line.   These  should  match the variants IDs in the --vcf file.  The second is the
              QTLtools output for the cis run that was ran using the same --vcf, --bed, and --cov
              files.  REQUIRED unless (and mutually exclusive with) --gwas-trans, --mergeQTL-cis,
              --mergeQTL-trans.

       --mergeQTL-trans variants_external.txt qtls_in_this_dataset.txt
              Run RTC for trans-molQTL and trans-molQTL colocalization analysis.  Takes two  file
              names  as  arguments.  The first is the file with trans-molQTL variants of interest
              discovered in a different dataset, e.g. different tissue, with one variant  ID  per
              line.   These  should  match the variants IDs in the --vcf file.  The second is the
              QTLtools output for the trans run that was ran using the  same  --vcf,  --bed,  and
              --cov files.  You will need to adjust *-col options.  REQUIRED unless (and mutually
              exclusive with) --gwas-trans, --gwas-cis, --mergeQTL-cis.

       --chunk integer1 integer2
              For parallelization.  Divide the data into integer2 number of  chunks  and  process
              chunk  number  integer1.   Chunk  0  will  print a header.  Mutually exclusive with
              --region.  Minimum number of  chunks  has  to  be  at  least  the  same  number  of
              chromosomes in the --bed file.

       --region chr:start-end
              Genomic  region  to  be processed.  E.g. chr4:12334456-16334456, or chr5.  Mutually
              exclusive with --chunk.

OUTPUT FILE

       --out output file
        Space separated output file with the following columns.  Columns after the 22nd are  only
        printed if --sample is provided.  We recommend including chunk 0 to print out a header in
        order to avoid confusion.

         1   other_variant                              The variant ID that is external  to  this
                                                        dataset,  could  be  the  GWAS variant or
                                                        another molQTL
         2   our_variant                                The molQTL variant ID that is internal to
                                                        this dataset
         3   phenotype                                  The phenotype ID
         4   phenotype_group                            The phenotype group ID
         5   other_variant_chr                          The external variant's chromosome
         6   other_variant_start                        The external variant's start position
         7   other_variant_rank                         Rank   of  the  external  variant.   Only
                                                        relevant if  the  external  variants  are
                                                        part of an conditional analysis
         8   our_variant_chr                            The internal variant's chromosome
         9   our_variant_start                          The internal variant's start position
        10   our_variant_rank                           Rank   of  the  internal  variant.   Only
                                                        relevant if  the  internal  variants  are
                                                        part of an conditional analysis
        11   phenotype_chr                              The phenotype's chromosome
        12   phenotype_start                            The start position of the phenotype
        13   distance_between_variants                  The distance between the two variants
        14   distance_between_other_variant_and_pheno   The distance between the external variant
                                                        and the phenotype
        15   other_variant_region_index                 The region index of the external variant
        16   our_variant_region_index                   The region index of the internal variant
        17   region_start                               The start position of the region
        18   region_end                                 The end position of the region
        19   variant_count_in_region                    The number of variants in the region
        20   RTC                                        The RTC score
        21   D'                                         The  D'  of  the  two   variants.    Only
                                                        calculated if there are phased genotypes
        22   r^2                                        The r squared of the two variants
        22   p_value                                    The p-value of the RTC score
        23   unique_picks_H0                            The  number  of  unique  combinations  of
                                                        variants in the H0 simulations
        24   unique_picks_H1                            The  number  of  unique  combinations  of
                                                        variants in the H1 simulations
        25   rtc_bin_start                              Lower  bound of the RTC bin, based on the
                                                        observed RTC score
        26   rtc_bin_end                                Upper bound of the RTC bin, based on  the
                                                        observed RTC score
        27   rtc_bin_H0_proportion                      The  proportion  of  H0  simulated values
                                                        that  are   between   rtc_bin_start   and
                                                        rtc_bin_end
        28   rtc_bin_H1_proportion                      The  proportion  of  H1  simulated values
                                                        that  are   between   rtc_bin_start   and
                                                        rtc_bin_end
        29   median_r^2                                 The median r-squared in the region
        30   median_H0                                  The   median   RTC   score   in   the  H0
                                                        simulations
        31   median_H1                                  The  median   RTC   score   in   the   H1
                                                        simulations
        32   H0                                         The   RTC   scores  observed  in  the  H0
                                                        simulations

        33   H1                                         The  RTC  scores  observed  in   the   H1
                                                        simulations

EXAMPLES

       o Run  RTC  with  GWAS variants and cis-eQTLs correcting for technical covariates and rank
         normal transforming the phenotype:

         QTLtools  rtc  --vcf  genotypes.chr22.vcf.gz  --bed  genes.50percent.chr22.bed.gz  --cov
         genes.covariates.pc50.txt.gz  --hotspot  hotspots_b37_hg19.bed  --gwas-cis  GWAS.b37.txt
         permutations_all.significant.txt --normal --out rtc_results.txt

       o RTC  with  GWAS  variants  and  cis-eQTLs  and  simulations,  correcting  for  technical
         covariates,  rank  normal  transforming  the phenotype, and running conditional analysis
         while keeping data in memory.  To facilitate  parallelization  on  compute  cluster,  we
         developed  an  option  to  run  the  analysis  into chunks of molecular phenotypes.  For
         instance, to run analysis on chunk 12 when  splitting  the  example  data  set  into  20
         chunks, run:

         QTLtools  rtc  --vcf  genotypes.chr22.vcf.gz  --bed  genes.50percent.chr22.bed.gz  --cov
         genes.covariates.pc50.txt.gz  --hotspot  hotspots_b37_hg19.bed  --gwas-cis  GWAS.b37.txt
         conditional_all.significant.txt  --normal  --conditional  --mem 2 --chunk 12 20 --sample
         200 --out rtc_results_12_20.txt

       o If you want to submit the whole analysis with 20 jobs on a  compute  cluster,  just  run
         (qsub needs to be changed to the job submission system used [bsub, psub, etc...]):

         for j in $(seq 0 20); do
             echo  "QTLtools  rtc --vcf genotypes.chr22.vcf.gz --bed genes.50percent.chr22.bed.gz
             --cov  genes.covariates.pc50.txt.gz   --hotspot   hotspots_b37_hg19.bed   --gwas-cis
             GWAS.b37.txt  conditional_all.significant.txt --normal --conditional --mem 2 --chunk
             $j 20 --sample 200 --out rtc_results_$j_20.txt" | qsub
         done

SEE ALSO

       QTLtools(1)

       QTLtools website: <https://qtltools.github.io/qtltools>

BUGS

       Please submit bugs to <https://github.com/qtltools/qtltools>

CITATION

       Ongen H, Brown AA, Delaneau O, et al. Estimating the causal tissues for complex traits and
       diseases.        Nat        Genet.        2017;49(12):1676-1683.       doi:10.1038/ng.3981
       <https://doi.org/10.1038/ng.3981>

AUTHORS

       Halit Ongen (halitongen@gmail.com), Olivier Delaneau (olivier.delaneau@gmail.com)