Provided by: qtltools_1.3.1+dfsg-2build2_amd64 bug

NAME

       QTLtools mbv - Match genotypes in a VCF to a BAM file

SYNOPSIS

       QTLtools  mbv  --bam  [sample.bam|sample.sam|sample.cram]  --vcf [in.vcf|in.bcf|in.vcf.gz]
       --out output_file [OPTIONS]

DESCRIPTION

       This mode checks if the genotypes in the VCF are observed in the RNAseq reads in  the  BAM
       file  to  quickly  solve  sample mislabeling and detect cross-sample contamination and PCR
       amplification    bias.     The     details     of     the     method     are     described
       <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6044394/>.   In  brief, we measure, for each
       individual in the VCF, the proportions of heterozygous and homozygous genotypes for  which
       both  alleles  are captured by the sequencing reads in the BAM file.  A 'match' would have
       close to 100% concordance for both measures, whereas a 'mismatch' will have  significantly
       lower  concordance  for  both  metrics.   Increased  cross-sample  contaminations leads to
       decreased homozygous concordance values with no change in heterozygous  concordance  while
       increased amplification bias leads to decreased heterozygous concordance with no change in
       homozygous concordance.  We recommend using uniquely mapping reads only by specifying  the
       correct --filter-mapping-quality.

OPTIONS

       --vcf [in.vcf|in.bcf|in.vcf.gz]
              Genotypes  in  VCF/BCF  format.   Should  contain  all  the samples in the dataset.
              REQUIRED.

       --bam [in.bam|in.sam|in.cram]
              Sequence data in BAM/SAM/CRAM format.  REQUIRED.

       --out output
              Output file name REQUIRED.

       --reg chr:start-end
              Genomic region to be processed.  E.g. chr4:12334456-16334456, or chr5

       --filter-mapping-quality integer
              Minimum mapping quality for a read or read pair to be considered.  Set this to only
              include uniquely mapped reads.  DEFAULT=10

       --filter-base-quality integer
              Minimum phred quality for a base to be considered.  DEFAULT=5

       --filter-binomial-pvalue float
              Binomial  p-value  threshold  below  which a heterozygous genotype is considered as
              exhibiting allelic imbalance.  DEFAULT=0.05

       --filter-minimal-coverage integer
              Minimum number of reads overlapping a genotype for it to be considered.  DEFAULT=10

       --filter-imputation-qual float
              Minimum imputation information score for a variant to be considered.  DEFAULT=0.9

       --filter-imputation-prob float
              Minimum posterior probability for a genotype to be considered.  DEFAULT=0.99

       --filter-keep-duplicates
              Keep reads designated as duplicate by the aligner.

OUTPUT FILE COLUMNS

       --out filename
        This file does not have header and it contains the following columns:

         1   The sample ID in the VCF against which the sequence data has been matched
         2   The number of missing genotypes for this sample
         3   The total number of heterozygous genotypes examined
         4   The total number of homozygous genotypes examined
         5   The number of heterozygous genotypes considered for the matching,  i.e.  those  that
             are covered by more than --filter-minimal-coverage
         6   The  number of homozygous genotypes considered for the matching, i.e. those that are
             covered by more than --filter-minimal-coverage
         7   The number of heterozygous genotypes that match between this sample and the BAM file
         8   The number of homozygous genotypes that match between this sample and the BAM file
         9   The percentage of heterozygous genotypes that match between this sample and the  BAM
             file
        10   The  percentage  of  homozygous genotypes that match between this sample and the BAM
             file
        11   The number of heterozygous genotypes with significant allelic imbalance

EXAMPLES

       o Running mbv on an RNAseq sample mapped with GEM:

         QTLtools    mbv    --bam    HG00381.chr22.bam    --out    HG00381.chr22.mbv.txt    --vcf
         genotypes.chr22.vcf.gz --filter-mapping-quality 150

         You  can  then  plot  column  9  vs. 10 to identify the genotyped sample in the VCF that
         matches best your sequence data.

SEE ALSO

       QTLtools(1)

       QTLtools website: <https://qtltools.github.io/qtltools>

BUGS

       Please submit bugs to <https://github.com/qtltools/qtltools>

CITATION

       Fort A., Panousis N. I., Garieri M. et al. MBV: a method to solve sample  mislabeling  and
       detect   technical  bias  in  large  combined  genotype  and  sequencing  assay  datasets,
       Bioinformatics 33(12), 1895 2017.  <https://doi.org/10.1093/bioinformatics/btx074>

AUTHORS

       Olivier Delaneau (olivier.delaneau@gmail.com), Halit Ongen (halitongen@gmail.com)