Provided by: pbgenomicconsensus_2.0.0+20151210-1_all bug

NAME

       pbgff - Pacific Biosciences extended GFFv3 file format

DESCRIPTION

       As of this version, variants.gff is our primary variant call file format.  The variants.gff file is based
       on  the  GFFv3  standard.   The  GFFv3 standard describes a tab-delimited plain-text file meta-format for
       describing genomic "features."  Each gff file consists of some initial "header" lines supplying metadata,
       and then a number of "feature" lines providing information about each identified variant.

   The GFF Coordinate System
       All coordinates in GFF files are 1-based, and all intervals start, end are understood as  including  both
       endpoints.

   Headers
       The variants.gff file begins with a block of metadata headers, which looks like the following:

          ##gff-version 3
          ##pacbio-variant-version 2.1
          ##date Tue Feb 28 17:44:18 2012
          ##feature-ontology http://song.cvs.sourceforge.net/*checkout*/song/ontology/sofa.obo?revision=1.12
          ##source GenomicConsensus v0.1.0
          ##source-commandline callVariants.py --algorithm=plurality aligned_reads.cmp.h5 -r spinach.fasta -o variants.gff
          ##source-alignment-file /home/popeye/data/aligned_reads.cmp.h5
          ##source-reference-file /home/popeye/data/spinach.fasta
          ##sequence-region EGFR_Exon_23 1 189
          ##sequence-header EGFR_Exon_24 1 200

       The  source  and  source-commandline  describe  the name and version of the software generating the file.
       pacbio-variant-version reflects the specification version that the file contents should adhere to.

       The sequence-region headers describe the names and  extents  of  the  reference  groups  (i.e.  reference
       contigs) that will be refered to in the file.  The names are the same as the full FASTA header.

       source-alignment-file and source-reference-file record absolute paths to the primary input files.

   Feature lines
       After  the headers, each line in the file describes a genomic feature; in this file, all the features are
       potential variants flagged by the variant caller.  The general format of a variant  line  is  a  9-column
       (tab-delimited)  record,  where  the  first 8 columns correspond to fixed, predefined entities in the GFF
       standard, while the 9th column is a flexible semicolon-delimited list of mappings key=value.

       The 8 predefined columns are as follows:
                         ────────────────────────────────────────────────────────────────────
                           Column Number   Name     Description              Example
                         ────────────────────────────────────────────────────────────────────
                           1               seqId    The full FASTA  header   lambda_NEB3011
                                                    for    the   reference
                                                    contig.
                         ────────────────────────────────────────────────────────────────────
                           2               source   (unused;        always   .
                                                    populated with .)
                         ────────────────────────────────────────────────────────────────────
                           3               type     the  type  of variant.   substitution
                                                    One   of    insertion,
                                                    deletion,           or
                                                    substitution.
                         ────────────────────────────────────────────────────────────────────
                           4               start    1-based          start   200
                                                    coordinate   for   the
                                                    variant.
                         ────────────────────────────────────────────────────────────────────
                           5               end      1-based end coordinate   215
                                                    for    the    variant.
                                                    start<=end      always
                                                    obtains, regardless of
                                                    strand.
                         ────────────────────────────────────────────────────────────────────
                           6               score    unused; populated with   .
                                                    .
                         ────────────────────────────────────────────────────────────────────
                           7               strand   unused; populated with   .
                                                    .
                         ────────────────────────────────────────────────────────────────────
                           8               phase    unused; populated with   .
                                                    .
                         ┌───────────────┬────────┬────────────────────────┬────────────────┐
                         │               │        │                        │                │
--

SEE ALSO

       The VCF and BED standards describe variant-call  specific  file  formats.   We  can  currently  translate
       variants.gff files to these formats, but they are not the primary output of the variant callers.

AUTHOR

       Pacific Biosciences <devnet@pacificbiosciences.com>

2.1                                                August 2014                                          PBGFF(5)