Ubuntu Manpage: gt-extractfeat - Extract features given in GFF3 file from sequence file.

Provided by: genometools_1.6.2+ds-3_amd64

NAME

       gt-extractfeat - Extract features given in GFF3 file from sequence file.

SYNOPSIS

       gt extractfeat [option ...] [GFF3_file]

DESCRIPTION

       -type [string]
           set type of features to extract (default: undefined)

       -join [yes|no]
           join feature sequences in the same subgraph into a single one (default: no)

       -translate [yes|no]
           translate the features (of a DNA sequence) into protein (default: no)

       -seqid [yes|no]
           add sequence ID of extracted features to FASTA descriptions (default: no)

       -target [yes|no]
           add target ID(s) of extracted features to FASTA descriptions (default: no)

       -coords [yes|no]
           add location of extracted features to FASTA descriptions (default: no)

       -retainids [yes|no]
           use ID attributes of extracted features as FASTA descriptions (default: no)

       -gcode [value]
           specify genetic code to use (default: 1)

       -seqfile [filename]
           set the sequence file from which to take the sequences (default: undefined)

       -encseq [filename]
           set the encoded sequence indexname from which to take the sequences (default:
           undefined)

       -seqfiles
           set the sequence files from which to extract the features use -- to terminate the list
           of sequence files

       -matchdesc [yes|no]
           search the sequence descriptions from the input files for the desired sequence IDs (in
           GFF3), reporting the first match (default: no)

       -matchdescstart [yes|no]
           exactly match the sequence descriptions from the input files for the desired sequence
           IDs (in GFF3) from the beginning to the first whitespace (default: no)

       -usedesc [yes|no]
           use sequence descriptions to map the sequence IDs (in GFF3) to actual sequence
           entries. If a description contains a sequence range (e.g., III:1000001..2000000), the
           first part is used as sequence ID (III) and the first range position as offset
           (1000001) (default: no)

       -regionmapping [string]
           set file containing sequence-region to sequence file mapping (default: undefined)

       -v [yes|no]
           be verbose (default: no)

       -width [value]
           set output width for FASTA sequence printing (0 disables formatting) (default: 0)

       -o [filename]
           redirect output to specified file (default: undefined)

       -gzip [yes|no]
           write gzip compressed output file (default: no)

       -bzip2 [yes|no]
           write bzip2 compressed output file (default: no)

       -force [yes|no]
           force writing to output file (default: no)

       -help
           display help and exit

       -version
           display version information and exit

       Genetic code numbers for option -gcode:

       1: Standard 2: Vertebrate Mitochondrial 3: Yeast Mitochondrial 4: Mold Mitochondrial;
       Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma 5:
       Invertebrate Mitochondrial 6: Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear 9:
       Echinoderm Mitochondrial; Flatworm Mitochondrial 10: Euplotid Nuclear 11: Bacterial,
       Archaeal and Plant Plastid 12: Alternative Yeast Nuclear 13: Ascidian Mitochondrial 14:
       Alternative Flatworm Mitochondrial 15: Blepharisma Macronuclear 16: Chlorophycean
       Mitochondrial 21: Trematode Mitochondrial 22: Scenedesmus obliquus Mitochondrial 23:
       Thraustochytrium Mitochondrial 24: Pterobranchia Mitochondrial 25: Candidate Division SR1
       and Gracilibacteria

       File format for option -regionmapping:

       The file supplied to option -regionmapping defines a “mapping”. A mapping maps the
       sequence-region entries given in the GFF3_file to a sequence file containing the
       corresponding sequence. Mappings can be defined in one of the following two forms:

           mapping = {
             chr1  = "hs_ref_chr1.fa.gz",
             chr2  = "hs_ref_chr2.fa.gz"
           }

       or

           function mapping(sequence_region)
             return "hs_ref_"..sequence_region..".fa.gz"
           end

       The first form defines a Lua (http://www.lua.org) table named “mapping” which maps each
       sequence region to the corresponding sequence file. The second one defines a Lua function
       “mapping”, which has to return the sequence file name when it is called with the
       sequence_region as argument.

REPORTING BUGS

       Report bugs to https://github.com/genometools/genometools/issues.