Provided by: pbgenomicconsensus_2.0.0+20151210-1_all 

NAME
pbgff - Pacific Biosciences extended GFFv3 file format
DESCRIPTION
As of this version, variants.gff is our primary variant call file format. The variants.gff file is based
on the GFFv3 standard. The GFFv3 standard describes a tab-delimited plain-text file meta-format for
describing genomic "features." Each gff file consists of some initial "header" lines supplying metadata,
and then a number of "feature" lines providing information about each identified variant.
The GFF Coordinate System
All coordinates in GFF files are 1-based, and all intervals start, end are understood as including both
endpoints.
Headers
The variants.gff file begins with a block of metadata headers, which looks like the following:
##gff-version 3
##pacbio-variant-version 2.1
##date Tue Feb 28 17:44:18 2012
##feature-ontology http://song.cvs.sourceforge.net/*checkout*/song/ontology/sofa.obo?revision=1.12
##source GenomicConsensus v0.1.0
##source-commandline callVariants.py --algorithm=plurality aligned_reads.cmp.h5 -r spinach.fasta -o variants.gff
##source-alignment-file /home/popeye/data/aligned_reads.cmp.h5
##source-reference-file /home/popeye/data/spinach.fasta
##sequence-region EGFR_Exon_23 1 189
##sequence-header EGFR_Exon_24 1 200
The source and source-commandline describe the name and version of the software generating the file.
pacbio-variant-version reflects the specification version that the file contents should adhere to.
The sequence-region headers describe the names and extents of the reference groups (i.e. reference
contigs) that will be refered to in the file. The names are the same as the full FASTA header.
source-alignment-file and source-reference-file record absolute paths to the primary input files.
Feature lines
After the headers, each line in the file describes a genomic feature; in this file, all the features are
potential variants flagged by the variant caller. The general format of a variant line is a 9-column
(tab-delimited) record, where the first 8 columns correspond to fixed, predefined entities in the GFF
standard, while the 9th column is a flexible semicolon-delimited list of mappings key=value.
The 8 predefined columns are as follows:
────────────────────────────────────────────────────────────────────
Column Number Name Description Example
────────────────────────────────────────────────────────────────────
1 seqId The full FASTA header lambda_NEB3011
for the reference
contig.
────────────────────────────────────────────────────────────────────
2 source (unused; always .
populated with .)
────────────────────────────────────────────────────────────────────
3 type the type of variant. substitution
One of insertion,
deletion, or
substitution.
────────────────────────────────────────────────────────────────────
4 start 1-based start 200
coordinate for the
variant.
────────────────────────────────────────────────────────────────────
5 end 1-based end coordinate 215
for the variant.
start<=end always
obtains, regardless of
strand.
────────────────────────────────────────────────────────────────────
6 score unused; populated with .
.
────────────────────────────────────────────────────────────────────
7 strand unused; populated with .
.
────────────────────────────────────────────────────────────────────
8 phase unused; populated with .
.
┌───────────────┬────────┬────────────────────────┬────────────────┐
│ │ │ │ │
--
SEE ALSO
The VCF and BED standards describe variant-call specific file formats. We can currently translate
variants.gff files to these formats, but they are not the primary output of the variant callers.
AUTHOR
Pacific Biosciences <devnet@pacificbiosciences.com>
2.1 August 2014 PBGFF(5)