Provided by: libvcflib-tools_1.0.9+dfsg1-2_amd64 bug

NAME

       vcflib index

DESCRIPTION

       vcflib  contains  tools and libraries for dealing with the Variant Call Format (VCF) which
       is a flat-file,  tab-delimited  textual  format  intended  to  describe  reference-indexed
       variations between individuals.

       VCF  provides  a common interchange format for the description of variation in individuals
       and populations of samples, and has become the defacto standard  reporting  format  for  a
       wide array of genomic variant detectors.

       vcflib  provides  methods  to  manipulate  and  interpret  sequence variation as it can be
       described by VCF.  It is both:

       • an API for parsing and operating on records of genomic variation as it can be  described
         by the VCF format,

       • and  a  collection  of command-line utilities for executing complex manipulations on VCF
         files.

       The API itself provides a quick and extremely permissive method  to  read  and  write  VCF
       files.   Extensions  and  applications  of  the library provided in the included utilities
       (*.cpp) comprise the vast bulk of the library’s utility for most users.

   filter
       filter command                             description
       ──────────────────────────────────────────────────────────────────────────
       vcffilter                                  VCF filter the  specified  vcf
                                                  file using the set of filters
       vcfuniq                                    List     unique     genotypes.
                                                  Similar to GNU uniq, but aimed
                                                  at   VCF   records.    vcfuniq
                                                  removes records which have the
                                                  same position, ref, and alt as
                                                  the  previous  record   on   a
                                                  sorted VCF file.  Note that it
                                                  does    not     adjust/combine
                                                  genotypes  in  the output, but
                                                  simply takes the first record.
                                                  See  also  vcfcreatemulti  for
                                                  combining records.
       vcfuniqalleles                             List unique alleles  For  each
                                                  record,  remove  any duplicate
                                                  alternate  alleles  that   may
                                                  have   resulted  from  merging
                                                  separate VCF files.

   metrics
       metrics command                            description
       ──────────────────────────────────────────────────────────────────────────
       vcfcheck                                   Validate     integrity     and
                                                  identity   of   the   VCF   by
                                                  verifying   that    the    VCF
                                                  record’s  REF  matches a given
                                                  reference file.

       vcfdistance                                Adds a  tag  to  each  variant
                                                  record   which  indicates  the
                                                  distance   to   the    nearest
                                                  variant.      (defaults     to
                                                  BasesToClosestVariant  if   no
                                                  custom tag name is given.
       vcfentropy                                 Annotate  VCF records with the
                                                  Shannon  entropy  of  flanking
                                                  sequence.  Anotates the output
                                                  VCF  file   with,   for   each
                                                  record,           EntropyLeft,
                                                  EntropyRight,   EntropyCenter,
                                                  which are the entropies of the
                                                  sequence of the  given  window
                                                  size  to  the left, right, and
                                                  center of  the  record.   Also
                                                  adds EntropyRef and EntropyAlt
                                                  for each alt.
       vcfhetcount                                Calculate  the  heterozygosity
                                                  rate:   count  the  number  of
                                                  alternate      alleles      in
                                                  heterozygous  genotypes in all
                                                  records in the vcf file
       vcfhethomratio                             Generates  the  het/hom  ratio
                                                  for  each  individual  in  the
                                                  file

   phenotype
       phenotype command                          description
       ──────────────────────────────────────────────────────────────────────────
       permuteGPAT++                              permuteGPAT++ is a method  for
                                                  adding empirical p-values to a
                                                  GPAT++ score.

   genotype
       genotype command                           description
       ──────────────────────────────────────────────────────────────────────────
       abba-baba                                  abba-baba calculates the  tree
                                                  pattern  for  four indviduals.
                                                  This tool assumes reference is
                                                  ancestral   and   ignores  non
                                                  abba-baba sites.   The  output
                                                  is a boolian value: 1 = true ,
                                                  0 = false for abba  and  baba.
                                                  the  tree  argument  should be
                                                  specified from the most  basal
                                                  taxa to the most derived.
       hapLrt                                     HapLRT  is  a likelihood ratio
                                                  test  for  haplotype  lengths.
                                                  The  lengths  are modeled with
                                                  an  exponential  distribution.
                                                  The sign denotes if the target
                                                  has longer haplotypes  (1)  or
                                                  the background (-1).
       normalize-iHS                              normalizes   iHS   or   XP-EHH
                                                  scores.

   transformation
       transformation command                     description
       ──────────────────────────────────────────────────────────────────────────
       dumpContigsFromHeader                      Dump contigs from header
       smoother                                   smoothes  is  a   method   for
                                                  window  smoothing  many of the
                                                  GPAT++ formats.

       vcf2dag                                    Modify VCF to be able to build
                                                  a directed acyclic graph (DAG)
       vcf2fasta                                  Generates  sample_seq:N.fa for
                                                  each     sample,     reference
                                                  sequence, and chromosomal copy
                                                  N in  [0,1...  ploidy].   Each
                                                  sequence  in the fasta file is
                                                  named using the  same  pattern
                                                  used   for   the   file  name,
                                                  allowing them to be combined.
       vcf2tsv                                    Converts VCF to per-allelle or
                                                  per-genotype     tab-delimited
                                                  format, using null  string  to
                                                  replace  empty  values  in the
                                                  table.   Specifying  -g   will
                                                  output  one  line  per  sample
                                                  with   genotype   information.
                                                  When  there  is  more than one
                                                  alt  allele  there   will   be
                                                  multiple  rows,  one  for each
                                                  allele  and,  the  info   will
                                                  match the `A' index
       vcfaddinfo                                 Adds   info  fields  from  the
                                                  second  file  which  are   not
                                                  present in the first vcf file.
       vcfafpath                                  Display genotype paths
       vcfallelicprimitives                       WARNING:    this    tool    is
                                                  considered legacy and is  only
                                                  retained  for older workflows.
                                                  It will emit a  warning!  Even
                                                  though  it can use the WFA you
                                                  should use vcfwave instead.
       vcfannotate                                Intersect the records  in  the
                                                  VCF file with targets provided
                                                  in a BED file.   Intersections
                                                  are   done  on  the  reference
                                                  sequences in the VCF file.  If
                                                  no  VCF  filename is specified
                                                  on  the  command  line   (last
                                                  argument)  the  VCF  read from
                                                  stdin.
       vcfannotategenotypes                       Examine               genotype
                                                  correspondence.       Annotate
                                                  genotypes in  the  first  file
                                                  with  genotypes  in the second
                                                  adding the genotype as another
                                                  flag  to  each sample filed in
                                                  the first  file.   annotation-
                                                  tag  is the name of the sample
                                                  flag which is added  to  store
                                                  the  annotation.   also adds a
                                                  `has_variant' flag  for  sites
                                                  where  the  second  file has a
                                                  variant.
       vcfbreakmulti                              If   multiple   alleles    are
                                                  specified  in a single record,
                                                  break the record into multiple
                                                  lines,    preserving   allele-
                                                  specific INFO fields.
       vcfcat                                     Concatenates VCF files
       vcfclassify                                Creates a new VCF  where  each
                                                  variant  is  tagged  by allele
                                                  class: snp, ts/tv, indel, mnp

       vcfcleancomplex                            Removes     reference-matching
                                                  sequence  from complex alleles
                                                  and adjusts records to reflect
                                                  positional change.
       vcfcombine                                 Combine        VCF       files
                                                  positionally,        combining
                                                  samples when sites and alleles
                                                  are identical.  Any number  of
                                                  VCF  files  may  be  combined.
                                                  The  INFO  field   and   other
                                                  columns  are taken from one of
                                                  the files which  are  combined
                                                  when records in multiple files
                                                  match.   Alleles   must   have
                                                  identical   ordering   to   be
                                                  combined into one record.   If
                                                  they  do not, multiple records
                                                  will be emitted.
       vcfcommonsamples                           Generates each record  in  the
                                                  first  file,  removing samples
                                                  not present in the second
       vcfcreatemulti                             Go through sorted VCF and when
                                                  overlapping     alleles    are
                                                  represented  across   multiple
                                                  records,  merge  them  into  a
                                                  single multi-ALT record.   See
                                                  the   documentation  for  more
                                                  information.
       vcfecho                                    Echo  VCF  to  stdout  (simple
                                                  demo)
       vcfevenregions                             Generates  a  list of regions,
                                                  e.g. chr20:10..30  using   the
                                                  variant   density  information
                                                  provided in the  VCF  file  to
                                                  ensure  that  the regions have
                                                  even  numbers   of   variants.
                                                  This  can be use to reduce the
                                                  variance   in   runtime   when
                                                  dividing  variant detection or
                                                  genotyping     by      genomic
                                                  coordinates.
       vcffixup                                   Generates  a  VCF stream where
                                                  AC and NS have been  generated
                                                  for  each  record using sample
                                                  genotypes
       vcfflatten                                 Removes multi-allelic sites by
                                                  picking    the   most   common
                                                  alternate.   Requires   allele
                                                  frequency  specification  `AF'
                                                  and use  of  `G'  and  `A'  to
                                                  specify  the fields which vary
                                                  according  to  the  Allele  or
                                                  Genotype.   VCF  file  may  be
                                                  specified on the command  line
                                                  or piped as stdin.
       vcfgeno2alleles                            modifies  the  genotypes field
                                                  to provide the literal alleles
                                                  rather than indexes
       vcfgeno2haplo                              Convert  genotype-based phased
                                                  alleles  within   –window-size
                                                  into  haplotype alleles.  Will
                                                  break  haplotype  construction
                                                  when  encountering  non-phased
                                                  genotypes on input.
       vcfgenosamplenames                         Get samplenames

       vcfglbound                                 Adjust GLs so that the maximum
                                                  GL  is  0  by dividing all GLs
                                                  for each sample by the max.
       vcfglxgt                                   Set   genotypes   using    the
                                                  maximum   genotype  likelihood
                                                  for each sample.
       vcfindex                                   Adds an index  number  to  the
                                                  INFO field (id=position)
       vcfinfo2qual                               Sets  QUAL from info field tag
                                                  keyed by [key].  The VCF  file
                                                  may  be  omitted and read from
                                                  stdin.   The  average  of  the
                                                  field  is  used if it contains
                                                  multiple values.
       vcfinfosummarize                           Take annotations given in  the
                                                  per-sample  fields and add the
                                                  mean, median, min, or  max  to
                                                  the site-level INFO.
       vcfintersect                               VCF set analysis
       vcfkeepgeno                                Reduce  file  size by removing
                                                  FORMAT fields  not  listed  on
                                                  the  command  line from sample
                                                  specifications in the output
       vcfkeepinfo                                To decrease file  size  remove
                                                  INFO  fields not listed on the
                                                  command line
       vcfkeepsamples                             outputs each record in the vcf
                                                  file,   removing  samples  not
                                                  listed on the command line
       vcfld                                      Compute LD
       vcfleftalign                               Left-align indels and  complex
                                                  variants  in the input using a
                                                  pairwise   ref/alt   alignment
                                                  followed   by   a   heuristic,
                                                  iterative   left   realignment
                                                  process   that   shifts  indel
                                                  representations    to    their
                                                  absolute leftmost (5’) extent.
       vcflength                                  Add length info field
       vcfnullgenofields                          Makes   the  FORMAT  for  each
                                                  variant line  the  same  (uses
                                                  all    the    FORMAT    fields
                                                  described  in   the   header).
                                                  Fills out per-sample fields to
                                                  match  FORMAT.    Expands   GT
                                                  values  of  `.' with number of
                                                  alleles based on  ploidy  (eg:
                                                  `./.' for dipolid).
       vcfnumalt                                  outputs  a  VCF  stream  where
                                                  NUMALT has been generated  for
                                                  each   record   using   sample
                                                  genotypes
       vcfoverlay                                 Overlay records in  the  input
                                                  vcf   files   with   order  as
                                                  precedence.
       vcfprimers                                 For each VCF  record,  extract
                                                  the  flanking  sequences,  and
                                                  write them to stdout as  FASTA
                                                  records      suitable      for
                                                  alignment.
       vcfqual2info                               Puts QUAL into an  info  field
                                                  tag keyed by [key].

       vcfremap                                   For   each  alternate  allele,
                                                  attempt to realign against the
                                                  reference   with  lowered  gap
                                                  open penalty.  If  realignment
                                                  is  possible, adjust the cigar
                                                  and        reference/alternate
                                                  alleles.       Observe     how
                                                  different            alignment
                                                  parameters,  including context
                                                  and  entropy-dependent   ones,
                                                  influence              variant
                                                  classification             and
                                                  interpretation.
       vcfremoveaberrantgenotypes                 strips   samples   which   are
                                                  homozygous      but       have
                                                  observations          implying
                                                  heterozygosity.         Remove
                                                  samples for which the reported
                                                  genotype (GT) and  observation
                                                  counts disagree (AO, RO).
       vcfremovesamples                           outputs each record in the vcf
                                                  file, removing samples  listed
                                                  on the command line
       vcfsample2info                             Take  annotations given in the
                                                  per-sample fields and add  the
                                                  mean,  median,  min, or max to
                                                  the site-level INFO.
       vcfsamplediff                              Establish   putative   somatic
                                                  variants     using    reported
                                                  differences  between  germline
                                                  and   somatic  samples.   Tags
                                                  each record where  the  listed
                                                  sample genotypes differ with .
                                                  The first sample is assumed to
                                                  be    germline,   the   second
                                                  somatic.    Each   record   is
                                                  tagged                    with
                                                  ={germline,somatic,loh}     to
                                                  specify  the  type  of variant
                                                  given the genotype  difference
                                                  between the two samples.
       vcfsamplenames                             List sample names
       vcfstreamsort                              Sorts  the input (either stdin
                                                  or  file)  using  a  streaming
                                                  sort   algorithm.   Guarantees
                                                  that the positional  order  is
                                                  correct  provided out-of-order
                                                  variants are no more than  100
                                                  positions   in  the  VCF  file
                                                  apart.
       vcfwave                                    Realign     reference      and
                                                  alternate  alleles  with  WFA,
                                                  parsing  out  the  `primitive'
                                                  alleles   into   multiple  VCF
                                                  records.  New records have IDs
                                                  that   reference   the  source
                                                  record ID.   Genotypes/samples
                                                  are     handled     correctly.
                                                  Deletions             generate
                                                  haploid/missing  genotypes  at
                                                  overlapping sites.

   statistics
       statistics command                         description
       ──────────────────────────────────────────────────────────────────────────

       bFst                                       bFst is a Bayesian approach to
                                                  Fst.      Importantly     bFst
                                                  accounts     for      genotype
                                                  uncertainty in the model using
                                                  genotype likelihoods.   For  a
                                                  more detailed description see:
                                                  `A   Bayesian   approach    to
                                                  inferring population structure
                                                  from  dominant   markers’   by
                                                  Holsinger   et   al. Molecular
                                                  Ecology Vol 11, issue 7  2002.
                                                  The  likelihood  function  has
                                                  been modified to use  genotype
                                                  likelihoods     provided    by
                                                  variant  callers.   There  are
                                                  five free parameters estimated
                                                  in     the     model:     each
                                                  subpopulation’s         allele
                                                  frequency  and  Fis  (fixation
                                                  index,       within       each
                                                  subpopulation),     a     free
                                                  parameter    for   the   total
                                                  population’s allele frequency,
                                                  and Fst.
       genotypeSummary                            Generates  a table of genotype
                                                  counts.   Summarizes  genotype
                                                  counts for bi-allelic SNVs and
                                                  indel
       iHS                                        iHS calculates the  integrated
                                                  haplotype score which measures
                                                  the relative decay of extended
                                                  haplotype  homozygosity  (EHH)
                                                  for    the    reference    and
                                                  alternative  alleles at a site
                                                  (see:  voight   et   al. 2006,
                                                  Spiech & Hernandez 2014).
       meltEHH
       pFst                                       pFst    is   a   probabilistic
                                                  approach     for     detecting
                                                  differences      in     allele
                                                  frequencies    between     two
                                                  populations.
       pVst                                       pVst calculates vst, a measure
                                                  of CNV stratification.
       permuteSmooth                              permuteSmooth is a method  for
                                                  adding    empirical   p-values
                                                  smoothed wcFst scores.
       plotHaps                                   plotHaps     provides      the
                                                  formatted  output  that can be
                                                  used                      with
                                                  `bin/plotHaplotypes.R'.
       popStats                                   General   population   genetic
                                                  statistics for each SNP
       segmentFst                                 segmentFst   creates   genomic
                                                  segments    (bed   file)   for
                                                  regions with high wcFst
       segmentIhs                                 Creates genomic segments  (bed
                                                  file)  for  regions  with high
                                                  wcFst

       sequenceDiversity                          The sequenceDiversity  program
                                                  calculates two popular metrics
                                                  of haplotype diversity: pi and
                                                  extended             haplotype
                                                  homozygoisty  (eHH).   Pi   is
                                                  calculated  using  the Nei and
                                                  Li 1979  formulation.   eHH  a
                                                  convenient  way to think about
                                                  haplotype diversity.  When eHH
                                                  =  0  all  haplotypes  in  the
                                                  window are unique and when eHH
                                                  =  1  all  haplotypes  in  the
                                                  window are identical.
       vcfaltcount                                count the number of  alternate
                                                  alleles  in all records in the
                                                  vcf file
       vcfcountalleles                            Count alleles
       vcfgenosummarize                           Adds  summary  statistics   to
                                                  each     record    summarizing
                                                  qualities reported  in  called
                                                  genotypes.       Uses:      RO
                                                  (reference observation count),
                                                  QR   (quality   sum  reference
                                                  observations)  AO   (alternate
                                                  observation     count),     QA
                                                  (quality     sum     alternate
                                                  observations)
       vcfgenotypecompare                         adds  statistics  to  the INFO
                                                  field   of   the   vcf    file
                                                  describing   the   amount   of
                                                  discrepancy    between     the
                                                  genotypes (GT) in the vcf file
                                                  and the genotypes reported  in
                                                  the    .    use   this   after
                                                  vcfannotategenotypes  to   get
                                                  correspondence  statistics for
                                                  two vcfs.
       vcfgenotypes                               Report the genotypes for  each
                                                  sample,  for  each  variant in
                                                  the    VCF.     Convert    the
                                                  numerical   represenation   of
                                                  genotypes provided by  the  GT
                                                  field   to   a  human-readable
                                                  genotype format.
       vcfparsealts                               Alternate    allele    parsing
                                                  method.    This   method  uses
                                                  pairwise alignment of REF  and
                                                  ALTs  to  determine  component
                                                  allelic  primitives  for  each
                                                  alternate allele.
       vcfrandom                                  Generate a random VCF file
       vcfrandomsample                            Randomly  sample sites from an
                                                  input VCF file, which  may  be
                                                  provided  as stdin.  Scale the
                                                  sampling  probability  by  the
                                                  field  specified in KEY.  This
                                                  may be used to provide uniform
                                                  sampling     across     allele
                                                  frequencies, for instance.
       vcfroc                                     Generates a  pseudo-ROC  curve
                                                  using      sensitivity     and
                                                  specificity estimated  against
                                                  a    putative    truth    set.
                                                  Thresholding  is  provided  by
                                                  successive QUAL cutoffs.

       vcfsitesummarize                           Summarize by site
       vcfstats                                   Prints     statistics    about
                                                  variants  in  the  input   VCF
                                                  file.
       wcFst                                      wcFst  is  Weir  & Cockerham’s
                                                  Fst   for   two   populations.
                                                  Negative   values  are  VALID,
                                                  they are sites  which  can  be
                                                  treated as zero Fst.  For more
                                                  information   see   Evolution,
                                                  Vol.    38  N.   6  Nov  1984.
                                                  Specifically    wcFst     uses
                                                  equations 1,2,3,4.

SOURCE CODE

       See the source code repository at https://github.com/vcflib/vcflib

CREDIT

       Citations  are  the  bread  and butter of Science.  If you are using this software in your
       research and want to support our future work, please cite the following publication:

       Please cite:

       A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-
       vcf,                cyvcf2,                hts-nim                and               slivar
       (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009123).
       Garrison E, Kronenberg ZN, Dawson ET, Pedersen BS, Prins P (2022), PLoS Comput Biol 18(5):
       e1009123.  https://doi.org/10.1371/journal.pcbi.1009123

LICENSE

       Copyright 2011-2023 (C) Erik Garrison and vcflib contributors.  MIT licensed.

AUTHORS

       Erik Garrison and vcflib contributors.