Provided by: bioperl_1.5.2.102-3_all bug

NAME

       bp_genbank2gff3.pl -- Genbank->gbrowse-friendly GFF3

SYNOPSIS

         bp_gbrowse_genbank2gff3.pl [options] filename(s)

         # process a directory containing GenBank flatfiles
         perl gbrowse_genbank2gff3.pl --dir path_to_files --zip

         # process a single file, ignore explicit exons and introns
         perl bp_genbank2gff3.pl --filter exon --filter intron file.gbk.gz

         # process a list of files
         perl bp_genbank2gff3.pl *gbk.gz

           Options:
               --dir     -d  path to a list of genbank flatfiles
               --outdir  -o  location to write GFF files
               --zip     -z  compress GFF3 output files with gzip
               --summary -s  print a summary of the features in each contig
               --filter  -x  genbank feature type(s) to ignore
               --split   -y  split output to seperate GFF and fasta files for
                             each genbank record
               --nolump  -n  seperate file for each reference sequence
                             (default is to lump all records together into one
                              output file for each input file)
               --ethresh -e  error threshold for unflattener
                             set this high (>2) to ignore all unflattener errors
               --help    -h  display this message

DESCRIPTION

       This script uses Bio::SeqFeature::Tools::Unflattener and
       Bio::Tools::GFF to convert GenBank flatfiles to GFF3 with gene
       containment hierarchies mapped for optimal display in gbrowse.

       The input files are assumed to be gzipped GenBank flatfiles for refseq
       contigs.  The files may contain multiple GenBank records.  Either a
       single file or an entire directory can be processed.  By default, the
       DNA sequence is embedded in the GFF but it can be saved into seperate
       fasta file with the --split(-y) option.

       If an input file contains multiple records, the default behaviour is to
       dump all GFF and sequence to a file of the same name (with .gff
       appended).  Using the ’nolump’ option will create a seperate file for
       each genbank record.  Using the ’split’ option will create seperate GFF
       and Fasta files for each genbank record.

       Notes

       Note1:

       In cases where the input files contain many GenBank records (for
       example, the chromosome files for the mouse genome build), a very large
       number of output files will be produced if the ’split’ or ’nolump’
       options are selected.  If you do have lists of files > 6000, use the
       --long_list option in bp_bulk_load_gff.pl or bp_fast_load_gff.pl to
       load the gff and/ or fasta files.

       Note2:

       This script is designed for refseq genomic sequence entries.  It may
       work for third party annotations but this has not been tested.

AUTHOR

       Sheldon McKay (mckays@cshl.edu)

       Copyright (c) 2004 Cold Spring Harbor Laboratory.