Provided by: genomethreader_1.7.3+dfsg-7_amd64
NAME
gth - predict genome structures
SYNOPSIS
gth [option ...] -genomic file [...] -cdna file [...] -protein file [...]
DESCRIPTION
Computes similarity-based gene structure predictions (spliced alignments) using cDNA/EST and/or protein sequences and assemble the resulting spliced alignments to consensus spliced alignments.
OPTIONS
-genomic <file> specify input files containing genomic sequences (mandatory option) -cdna <file> specify input files containing cDNA/EST sequences -protein <file> specify input files containing protein sequences -species <species> specify species to select splice site model which is most appropriate; possible species: "human" "mouse" "rat" "chicken" "drosophila" "nematode" "fission_yeast" "aspergillus" "arabidopsis" "maize" "rice" "medicago" default: undefined -bssm read bssm parameter from file in the path given by the environment variable BSSMDIR, default: undefined -scorematrix read amino acid substitution scoring matrix from file in the path given by the environment variable GTHDATADIR default: BLOSUM62 -translationtable set the codon translation table used for codon translation in matching, DP, and output default: 1 -f analyze only forward strand of genomic sequences default: no -r analyze only reverse strand of genomic sequences default: no -cdnaforward align only forward strand of cDNAs default: no -frompos analyze genomic sequence from this position requires -topos or -width; counting from 1 on default: 0 -topos analyze genomic sequence to this position requires -frompos; counting from 1 on default: 0 -width analyze only this width of genomic sequence requires -frompos default: 0 -v be verbose default: no -xmlout show output in XML format default: no -gff3out show output in GFF3 format default: no -md5ids show MD5 fingerprints as sequence IDs default: no -o redirect output to specified file default: undefined -gzip write gzip compressed output file default: no -bzip2 write bzip2 compressed output file default: no -force force writing to output file default: no -skipalignmentout skip output of spliced alignments default: no -mincutoffs show full spliced alignments i.e., cutoffs mode for leading and terminal bases is MINIMAL default: no -showintronmaxlen set the maximum length of a fully shown intron If set to 0, all introns are shown completely default: 120 -minorflen set the minimum length of an ORF to be shown default: 64 -startcodon require than an ORF must begin with a start codon default: no -finalstopcodon require that the final ORF must end with a stop codon default: no -showseqnums show sequence numbers in output default: no -pglgentemplate show genomic template in PGL lines (switch off for backward compatibility) default: yes -gs2out output in old GeneSeqer2 format default: no -maskpolyatails mask poly(A) tails in cDNA/EST files default: no -proteinsmap specify smap file used for protein files default: protein -noautoindex do not create indices automatically except for the .dna.* files used for the DP. existence is not tested before an index is actually used! default: no -createindicesonly stop program flow after the indices have been created default: no -skipindexcheck skip index check (in preprocessing phase) default: no -minmatchlen specify minimum match length (cDNA matching) default: 20 -seedlength specify the seed length (cDNA matching) default: 18 -exdrop specify the Xdrop value for edit distance extension (cDNA matching) default: 2 -prminmatchlen specify minimum match length (protein matches) default: 24 -prseedlength specify seed length (protein matching) default: 10 -prhdist specify Hamming distance (protein matching) default: 4 -online run the similarity filter online without using the complete index (increases runtime) default: no -inverse invert query and index in vmatch call default: no -exact use exact matches in the similarity filter default: no -gcmaxgapwidth set the maximum gap width for global chains defines approximately the maximum intron length set to 0 to allow for unlimited length in order to avoid false-positive exons (lonely exons) at the sequence ends, it is very important to set this parameter appropriately! default: 1000000 -gcmincoverage set the minimum coverage of global chains regarding to the reference sequence default: 50 -paralogs compute paralogous genes (different chaining procedure) default: no -enrichchains enrich genomic sequence part of global chains with additional matches default: no -introncutout enable the intron cutout technique default: no -fastdp use jump table to increase speed of DP calculation default: no -autointroncutout set the automatic intron cutout matrix size in megabytes and enable the automatic intron cutout technique default: 0 -icinitialdelta set the initial delta used for intron cutouts default: 50 -iciterations set the number of intron cutout iterations default: 2 -icdeltaincrease set the delta increase during every iteration default: 50 -icminremintronlen set the minimum remaining intron length for an intron to be cut out default: 10 -nou12intronmodel disable the U12-type intron model default: no -u12donorprob set the probability for perfect U12-type donor sites default: 0.99 -u12donorprob1mism set the prob. for U12-type donor w. 1 mismatch default: 0.90 -probies set the initial exon state probability default: 0.50 -probdelgen set the genomic sequence deletion probability default: 0.03 -identityweight set the pairs of identical characters weight default: 2.00 -mismatchweight set the weight for mismatching characters default: -2.00 -undetcharweight set the weight for undetermined characters default: 0.00 -deletionweight set the weight for deletions default: -5.00 -dpminexonlen set the minimum exon length for the DP default: 5 -dpminintronlen set the minimum intron length for the DP default: 50 -shortexonpenal set the short exon penalty default: 100.00 -shortintronpenal set the short intron penalty default: 100.00 -wzerotransition set the zero transition weights window size default: 80 -wdecreasedoutput set the decreased output weights window size default: 80 -leadcutoffsmode set the cutoffs mode for leading bases can be either RELAXED, STRICT, or MINIMAL default: RELAXED -termcutoffsmode set the cutoffs mode for terminal bases can be either RELAXED, STRICT, or MINIMAL default: STRICT -cutoffsminexonlen set the cutoffs minimum exon length default: 5 -scoreminexonlen set the score minimum exon length default: 50 -minaveragessp set the minimum average splice site prob. default: 0.50 -duplicatecheck criterion used to check for spliced alignment duplicates, choose from none|id|desc|seq|both default: both -minalignmentscore set the minimum alignment score for spliced alignments to be included into the set of spliced alignments default: 0.00 -maxalignmentscore set the maximum alignment score for spliced alignments to be included into the set of spliced alignments default: 1.00 -mincoverage set the minimum coverage for spliced alignments to be included into the set of spliced alignments default: 0.00 -maxcoverage set the maximum coverage for spliced alignments to be included into the set of spliced alignments default: 9999.99 -intermediate stop after calculation of spliced alignments and output results in reusable XML format. Do not process this output yourself, use the ``normal'' XML output instead! default: no -sortags sort alternative gene structures according to the weighted mean of the average exon score and the average splice site probability default: no -sortagswf set the weight factor for the sorting of AGSs default: 1.00 -exondistri show the exon length distribution default: no -introndistri show the intron length distribution default: no -refseqcovdistri show the reference sequence coverage distribution default: no -first set the maximum number of spliced alignments per genomic DNA input. Set to 0 for unlimited number. default: 0 -help display help for basic options and exit -help+ display help for all options and exit -version display version information and exit GTH(1)