Provided by: sim4db_0~20150903+r2013-8build3_amd64
NAME
sim4db - batch spliced alignment of cDNA sequences to a target genome
SYNOPSIS
A simple command line invocation: sim4db -genomic g.fasta -cdna c.fasta -scr script -output o.sim4db where: - 'c.fasta' and 'g.fasta' are the multi-fasta cDNA and genome sequence files - 'script' is a script file indicating individual alignments to be computed - output in sim4db format will be sent to the file 'o.sim4db' ('-' for standard output) A more complex invocation: sim4db -genomic g.fasta -cdna c.fasta -output o.sim4db [options]
DESCRIPTION
sim4db performs fast batch alignment of large cDNA (EST, mRNA) sequence sets to a set of eukaryotic genomic regions. It uses the sim4 and sim4cc algorithms to determine the alignments, but incorporates a fast sequence indexing and retrieval mechanism, implemented in the sister package leaff(1), to speedily process large volumes of sequences. While sim4db produces alignments in the same way as sim4 or sim4cc, it has additional features to make it more amenable for use with whole-genome annotation pipelines. A script file can be used to group pairings between cDNAs and their corresponding genomic regions, to be aligned as one run and using the same set of parameters. Sim4db also optionally reports more than one alignment for the same cDNA within a genomic region, as long as they meet user-defined criteria such as minimum length, percentage sequence identity or coverage. This feature is instrumental in finding all alignments of a gene family at one locus. Lastly, the output is presented either as custom sim4db alignments or as GFF3 gene features.
OPTIONS
Salient options: -cdna use these cDNA sequences (multi-fasta file) -genomic use these genomic sequences (multi-fasta file) -script use this script file -pairwise sequentially align pairs of sequences If none of the '-script' and '-pairwise' options is specified, sim4db performs all-against-all alignments between pairs of cDNA and genomic sequences. -output write output to this file -gff3 report output in GFF3 format -interspecies use sim4cc for inter-species alignments (default sim4) Filter options: -mincoverage iteratively find all exon models with the specified minimum PERCENT COVERAGE -minidentity iteratively find all exon models with the specified minimum PERCENT EXON IDENTITY -minlength iteratively find all exon models with the specified minimum ABSOLUTE COVERAGE (number of bp matched) (default 0) -alwaysreport always report <number> exon models, even if they are below the quality thresholds If no mincoverage or minidentity or minlength is given, only the best exon model is returned. This is the DEFAULT operation. You will probably want to specify ALL THREE of mincoverage, minidentity and minlength! Don't assume the default values are what you want! You will DEFINITELY want to specify at least one of mincoverage, minidentity and minlength with alwaysreport! If you don't, mincoverage will be set to 90 and minidentity to 95 -- to reduce the number of spurious matches when a good match is found. Auxiliary options: -nodeflines don't include the defline in the sim4db output -alignments print alignments -polytails DON'T mask poly-A and poly-T tails -cut trim marginal exons if A/T % > x (poly-AT tails) -noncanonical don't force canonical splice sites -splicemodel use the following splice model: 0 - original sim4; 1 - GeneSplicer; 2 - Glimmer; options 1 and 2 are only available with '-interspecies'. Default for sim4 is 0, and for sim4cc is 1. -forcestrand Force the strand prediction to always be one of 'forward' or 'reverse' Execution options: -threads Use n threads. -touch create this file when the program finishes execution Debugging options: -v print status to stderr while running -V print script lines (stderr) as they are being processed Developer options: -Z set the spaced seed pattern -H set the relink weight factor (H=1000 recommended for mRNAs) -K set the first MSP threshold -C set the second MSP threshold -Ma set the limit of the number of MSPs allowed -Mp same, as percentage of bases in cDNA NOTE: If used, both -Ma and -Mp must be specified!
SEE ALSO
README.sim4db http://kmer.sourceforge.net/wiki/index.php/Getting_Started_with_Sim4db The sim4dbutils(1) package contains a range of utilities to work with sim4db-generated alignment files, of particular note being convertPolishes(1), filterPolishes(1), mergePolishes(1), and sortPolishes(1) January 2016 SIM4DB(1)