Provided by: chado-utils_1.31-6_all
NAME
$0 - Prepares a GFF3 file for bulk loading into a chado database.
SYNOPSIS
% gmod_gff_preprocessor [options] --gfffile <filename>
COMMAND-LINE OPTIONS
--gfffile The file containing GFF3 (optional, can read from stdin) --outfile The name kernel that will be used for naming result files --splitfile Split the files into more manageable chunks, providing an argument to control splitting --onlysplit Split the files and then quit (ie, don't sort) --nosplit Don't split the files (ie, only sort) --hasrefseq Set this if the file contains a reference sequence line (Only needed if not splitting files) --dbprofile Specify a gmod.conf profile name (otherwise use default) --inheritance_tiers How many levels of inheritance do you expect tis file to have (default: 3)
DESCRIPTION
splitfile -- Just setting this flag to 1 will cause the file to be split by reference sequence. If you provide an optional argument, it will be further split according to these rules: source=1 Splits files according to the value in the source column source=a,b,c Puts lines with sources that match (via regular expression) 'a', 'b', or 'c' in a separate file type=a,b,c Puts lines with types that match 'a', 'b', or 'c' in a separate file For example, if you wanted all of your analysis results to go in a separate file, you could indicate '--splitfile type=match', and all cDNA_match, EST_match and cross_genome_match features would go into separate files (separate by reference sequence). inheritence_tiers -- The number of levels of inheritance this file has. For example, if the file has "central dogma" genes in it (gene/mRNA/ exon,polypeptide), then it has 3. Up to 4 is supported but the higher the number, the more slowly it performs. If you don't know, 3 is a reasonable guess. FASTA sequence If the GFF3 file contains FASTA sequence at the end, the sequence will be placed in a separate file with the extension '.fasta'. This fasta file can be loaded separately after the split and/or sorted GFF3 files are loaded, using the command: gmod_bulk_load_gff3.pl -g <fasta file name>
AUTHOR
Scott Cain <cain@cshl.org> Copyright (c) 2006-2007 This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.