lunar (1) QTLtools-rtc-union.1.gz

Provided by: qtltools_1.3.1+dfsg-4_amd64 bug

NAME

       QTLtools rtc-union - Find the union of QTLs from independent datasets

SYNOPSIS

       QTLtools     rtc-union     --vcf     [in.vcf|in.vcf.gz|in.bcf|in.bed.gz]     ...     --bed
       quantifications.bed.gz      ...       --hotspots      hotspots_b37_hg19.bed      --results
       qtl_results_files.txt ...  [OPTIONS]

DESCRIPTION

       This mode finds the best molQTL (may or may not be genome-wide significant) in each region
       flanked by recombination hotspots (coldspot),  if there was a molQTL in the same  coldspot
       in  one  dataset.   First  we  map  all  the significant molQTLs in all of the datasets to
       coldspots. Subsequently if certain datasets do not have a significant molQTL  in  a  given
       coldspot  for a given phenotype, we then take the most significant variant associated with
       that phenotype in that coldspot, for all the missing datasets.

OPTIONS

       --vcf [in.vcf|in.bcf|in.vcf.gz|in.bed.gz] ...
              Genotypes in VCF/BCF format, or another molecular  phenotype  in  BED  format.   If
              there  is  a  DS  field in the genotype FORMAT of a variant (dosage of the genotype
              calculated from genotype probabilities, e.g. after imputation), then this  is  used
              as the genotype.  If there is only the GT field in the genotype FORMAT then this is
              used and it is converted to a dosage.  If  a  single  file  is  provided  then  all
              datasets  are assumed to have the same genotypes, and all datasets' samples are all
              included in this file.  If multiple files are provided for each dataset,  then  all
              --vcf,  --bed,  --cov,  and  --results files MUST be in the same order.  E.g if the
              first vcf file is from dataset1, then the first bed, cov, and  results  files  must
              also be from dataset1.  REQUIRED.

       --bed quantifications.bed.gz ...
              Molecular  phenotype  quantifications  in BED format for each of the datasets.  All
              --vcf, --bed, --cov, and --results files MUST be in the same  order.   E.g  if  the
              first  vcf  file  is from dataset1, then the first bed, cov, and results files must
              also be from dataset1.  REQUIRED.

       --results significant_qtls.txt ...
              Results file with the QTLs in each of the datasets.  All --vcf, --bed,  --cov,  and
              --results  files  MUST  be  in  the  same order.  E.g if the first vcf file is from
              dataset1, then the first bed, cov, and results files must also  be  from  dataset1.
              REQUIRED.

       --hotspots recombination_hotspots.bed
              Recombination hotspots in BED format.  REQUIRED.

       --out-suffix suffix
              If provided output files will be suffixed with this.

       --cov covariates.txt
              Covariates to correct the phenotype data with for each of the datasets.  All --vcf,
              --bed, --cov, and --results files MUST be in the same order.  E.g if the first  vcf
              file is from dataset1, then the first bed, cov, and results files must also be from
              dataset1.

       --force
              If the output file exists, overwrite it.

       --normal
              Rank normal transform the  phenotype  data  so  that  each  phenotype  is  normally
              distributed.  RECOMMENDED.

       --conditional
              molQTLs contain independent signals so execute the conditional analysis.

       --window integer
              Size  of the cis window flanking each phenotype's start position.  DEFAULT=1000000.
              RECOMMENDED=1000000.

       --pheno-col integer
              1-based phenotype id column number.  DEFAULT=1

       --geno-col integer
              1-based genotype id column number.  DEFAULT=8

       --rank-col integer
              1-based conditional analysis rank column number.  Only relevant if --conditional is
              in effect.  DEFAULT=12

       --best-col integer
              1-based  phenotype  column  number  Only  relevant  if  --conditional is in effect.
              DEFAULT=21

       --chunk integer1 integer2
              For parallelization.  Divide the data into integer2 number of  chunks  and  process
              chunk  number  integer1.   Chunk  0  will  print a header.  Mutually exclusive with
              --region.  Minimum number of  chunks  has  to  be  at  least  the  same  number  of
              chromosomes in the --bed file.

       --region chr:start-end
              Genomic  region  to  be processed.  E.g. chr4:12334456-16334456, or chr5.  Mutually
              exclusive with --chunk.

OUTPUT FILE

       output file
        Space separated output file with the following columns.

        1   Column showing that this is a rtc-union result.  Always __UNION__
        2   The phenotype ID
        3   The     genotype     ID.      This      can      say      __UNION_FILLER_MAX_INDEP__,
            __UNION_FILLER_MISS_GENO__,  or  __UNION_FILLER_MISS_PHENO__  which  are  fillers for
            missing cases in one of the datasets.
        4   The rank of the best variant in this coldspot.  If this was discovered  in  the  rtc-
            union  run  then  this would be -1, and if there was already a significant variant in
            this coldspot then a different value.
        5   Dummy field indicating that this is the best hit per rank
        6   The p-value of the association.  Will be 0 if this was  already  significant  in  the
            dataset
        7   The coldspot ID
        8   The coldspot region

EXAMPLE

       o Find  the  union  of  3  datasets,  correcting for technical covariates, and rank normal
         transforming the phenotypes with 20 jobs on a compute cluster (qsub needs to be  changed
         to the job submission system used [bsub, psub, etc...]):

         for j in $(seq 1 20); do
             echo  "QTLtools  rtc-union  --bed  dataset1.bed.gz  dataset2.bed.gz  dataset3.bed.gz
             --vcf   dataset1.bcf   dataset2.bcf   dataset3.bcf   --cov   dataset1.covariates.txt
             dataset2.covariates.txt  dataset3.covariates.txt --results dataset1.txt dataset2.txt
             dataset3.txt --hotspots hotspots_b37_hg19.bed --normal --conditional --chunk  $j  20
             --out-suffix .chunk.$j.20.txt" | qsub
         done

SEE ALSO

       QTLtools(1)

       QTLtools website: <https://qtltools.github.io/qtltools>

BUGS

       Versions  up  to  and  including  1.2,  suffer  from a bug in reading missing genotypes in
       VCF/BCF files.  This bug affects variants with a DS field in their genotype's  FORMAT  and
       have a missing genotype (DS field is .) in one of the samples, in which case genotypes for
       all the samples are set to missing, effectively removing this variant from the analyses.

       Please submit bugs to <https://github.com/qtltools/qtltools>

CITATION

       Ongen H, Brown AA, Delaneau O, et al. Estimating the causal tissues for complex traits and
       diseases.        Nat        Genet.        2017;49(12):1676-1683.       doi:10.1038/ng.3981
       <https://doi.org/10.1038/ng.3981>

AUTHORS

       Halit Ongen (halitongen@gmail.com), Olivier Delaneau (olivier.delaneau@gmail.com)