Provided by: gbrowse_2.56+dfsg-3build1_all
NAME
gtf2gff3 - Converts GTF formatted files to valid GFF3 files
VERSION
This document describes version 0.1
SYNOPSIS
gtf2gff3 --cfg gtf2gff3_MY_CONFIG.cfg gtf_file > gff3_file
DESCRIPTION
This script will convert GTF formatted files to valid GFF3 formatted files. It will map the value in column 3 (\"type\" column) to valid SO, but because many non standard term may appear in that column in GTF files, you may edit the config file to provide your own GTF feature to SO mapping. The script will also build gene models from exons, CDSs and other features given in the GTF file. It is currently tested on Ensemble and Twinscan GTF, and it should work on any other files that follow the same specification. It does not work on GTF from the UCSC table browser because those files use the same ID for gene and transcript, so it is impossible to group multiple transcripts to a gene. See the README that came with the script for more info.
OPTIONS:
--cfg Provide the filename for a config file. See the configuration file provided with this script for format details. Use this configuration file to modify the behavior of the script. If no config file is given it looks for ./gtf2gff3.cfg, ~/gtf2gff3.cfg or /etc/gtf2gff3.cfg in that order. --help Provide a detailed man page style help message and then exit.
DIAGNOSTICS
"ERROR: Missing or non-standard attributes: parse_attributes" A line in the GTF file did not have any attributes, or it's attributes column was unparsable. "ERROR: Non-transcript gene feature not supported. Please contact the author for support: build_gene" This warning indicates that a line was skipped because it contained a non-transcript gene feature, and the code is not currently equipped to handle this type of feature. This probably isn't too hard to add, so contact me if you get this error and would like to have these features supported. "ERROR: Must have at least exons or CDSs to build a transcript: build_trnsc" Some feature had a transcript_id and yet there were no exons or CDSs associated with that transcript_id so the script failed to build a transcript. "ERROR: seq_id conflict: validate_and_finish_trnsc" Found two features within the same transcript that didn't share the same seq_id. "ERROR: source conflict: validate_and_finish_trnsc" Found two features within the same transcript that didn't share the same source. "ERROR: type conflict: validate_and_finish_trnsc" Found two features within the same transcript that were expected to share the same type and yet they didn't. "ERROR: strand conflict: validate_and_finish_trnsc" Found two features within the same transcript that didn't share the same strand. "ERROR: seq_id conflict: validate_and_build_gene" Found two features within the same gene that didn't share the same seq_id. "ERROR: source conflict: validate_and_build_gene" Found two features within the same gene that didn't share the same source. "ERROR: strand conflict: validate_and_build_gene" Found two features within the same gene that didn't share the same strand. "ERROR: gene_id conflict: validate_and_build_gene" Found two features within the same gene that didn't share the same gene_id. "FATAL: Can't open GTF file: file_name for reading." Unable to open the GTF file for reading. "FATAL: Need exons or CDSs to build transcripts: process_start" A start_codon feature was annotated and yet there were no exons or CDSs associated with that transcript_id so the script failed. "FATAL: Untested code in process_start. Contact the aurthor for support." The script is written to infer a start codon based on the presence of a 5' UTR, but we had no example GTF of this type when we wrote the code, so we killed process rather than run untested code. Contact the author for support. "FATAL: Invalid feature set: process_start" We tried to consider all possible ways of inferring a start codon or inferring a a non-coding gene, and yet we've failed. Your combination of gene features doesn't make sense to us. You should never get this error, and if you do, we'd really like to see the GTF file that generated it. Please contact the author for support. "FATAL: Need exons or CDSs to build transcripts: process_stop" A stop_codon feature was annotated and yet there were no exons or CDSs associated with that transcript_id so the script failed. "FATAL: Untested code in process_stop. Contact the aurthor for support." The script is written to infer a stop codon based on the presence of a 3' UTR, but we had no example GTF of this type when we wrote the code, so we killed process rather than run untested code. Contact the author for support. "FATAL: Invalid feature set: process_stop" We tried to consider all possible ways of inferring a stop codon or inferring a a non- coding gene, and yet we've failed. Your combination of gene features doesn't make sense to us. You should never get this error, and if you do, we'd really like to see the GTF file that generated it. Please contact the author for support. "FATAL: Invalid feature set: process_exon_CDS_UTR" We tried to consider all possible ways of inferring exons, CDSs and UTRs and yet we've failed. Your combination of gene features doesn't make sense to us. You really should ever get this error, and if you do, we'd really like to see the GTF file that generated it. Please contact the author for support. "FATAL: Array reference required: sort_features." A user shouldn't be able to trigger this error. It almost certainly indicates a software bug. Please contact the author. "FATAL: Can't determine strand in: sort_feature_types." This may indicate that your GTF file does not indicate the strand for features that require it. It may also indicate a software bug. Please contact the author. "FATAL: Hash reference required: sort_feature_types." A user shouldn't be able to trigger this error. It almost certainly indicates a software bug. Please contact the author. "FATAL: Invalid value passed to strand: strand." This may indicate that your GTF file does not indicate the strand for features that require it. Consider using the DEFAULT_STRAND parameter in the config file. It may also indicate a software bug. Please contact the author.
CONFIGURATION AND ENVIRONMENT
A configuration file is provided with this script. The script will look for that configuration file in ./gtf2gff3.cfg, ~/gtf2gff3.cfg or /etc/gtf2gff3.cfg in that order. If the configuration file is not found in one of those locations and one is not provided via the --cfg flag it will try to choose some sane defaults, but you really should provide the configuration file. See the supplied configuration file itself as well as the README that came with this package for format and details about the configuration file.
DEPENDENCIES
This script requires the following perl packages that are available from CPAN (www.cpan.org). Getopt::Long; use Config::Std;
INCOMPATIBILITIES
None reported.
BUGS AND LIMITATIONS
No bugs have been reported. Please report any bugs or feature requests to: <barry.moore@genetics.utah.edu>
AUTHOR
Barry Moore <barry.moore@genetics.utah.edu>
LICENCE AND COPYRIGHT
Copyright (c) 2007, University of Utah This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.