lunar (1) gmod_gff3_preprocessor.pl.1p.gz

Provided by: chado-utils_1.31-6_all bug

NAME

       $0 - Prepares a GFF3 file for bulk loading into a chado database.

SYNOPSIS

         % gmod_gff_preprocessor [options] --gfffile <filename>

COMMAND-LINE OPTIONS

        --gfffile        The file containing GFF3 (optional, can read
                            from stdin)
        --outfile        The name kernel that will be used for naming result files
        --splitfile      Split the files into more manageable chunks, providing
                            an argument to control splitting
        --onlysplit      Split the files and then quit (ie, don't sort)
        --nosplit        Don't split the files (ie, only sort)
        --hasrefseq      Set this if the file contains a reference sequence line
                            (Only needed if not splitting files)
        --dbprofile      Specify a gmod.conf profile name (otherwise use default)
        --inheritance_tiers How many levels of inheritance do you expect tis file
                            to have (default: 3)

DESCRIPTION

       splitfile  -- Just setting this flag to 1 will cause the file to be split by reference
       sequence.  If you provide an optional argument, it will be further split according to
       these rules:

        source=1     Splits files according to the value in the source column
        source=a,b,c Puts lines with sources that match (via regular expression)
                            'a', 'b', or 'c' in a separate file
        type=a,b,c   Puts lines with types that match 'a', 'b', or 'c' in a
                            separate file

       For example, if you wanted all of your analysis results to go in a separate file, you
       could indicate '--splitfile type=match', and all cDNA_match, EST_match and
       cross_genome_match features would go into separate files (separate by reference sequence).

       inheritence_tiers -- The number of levels of inheritance this file has.  For example, if
       the file has "central dogma" genes in it (gene/mRNA/ exon,polypeptide), then it has 3.  Up
       to 4 is supported but the higher the number, the more slowly it performs.  If you don't
       know, 3 is a reasonable guess.

   FASTA sequence
       If the GFF3 file contains FASTA sequence at the end, the sequence will be placed in a
       separate file with the extension '.fasta'.  This fasta file can be loaded separately after
       the split and/or sorted GFF3 files are loaded, using the command:

         gmod_bulk_load_gff3.pl -g <fasta file name>

AUTHOR

       Scott Cain <cain@cshl.org>

       Copyright (c) 2006-2007

       This library is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.