Provided by: segemehl_0.3.4-5build2_amd64
NAME
segemehl - Heuristic mapping of short sequences
SYNOPSIS
segemehl [-besVOc] -d <file> [<file>] [-q <file>] [-p <file>] [-i <file>] [-j <file>] [-x <file>] [-y <file>] [-G <file>] [-g <string>] [-t <n>] [-o <string>] [-u <file>] [-B <string>] [-F <n>] [-S [<basename>]] [-A <n>] [-D <n>] [-E <double>] [-H] [-m <n>] [-Z <n>] [-W <n>] [-U <n>] [-l <f>] [-w <double>] [-X <n>] [-J <n>] [-I <n>] [-M <n>] [-n <n>] [-r <n>] [--skipidcheck] [--showalign] [--nohead]
DESCRIPTION
Segemehl is a software to map short sequencer reads to reference genomes. Segemehl implements a matching strategy based on enhanced suffix arrays (ESA). Segemehl accepts fasta and fastq queries (gzip’ed and bgzip'ed). In addition to the alignment of reads from standard DNA- and RNA-seq protocols, it also allows the mapping of bisulfite converted reads (Lister and Cokus) and implements a split read mapping strategy. The output of segemehl is a SAM or BAM formatted alignment file. In the case of split-read mapping, additional BED files are written to the disc. These BED files may be summarized with the postprocessing tool haarz. In the case of the alignment of bisulfite converted reads, raw methylation rates may also be called with haarz. In brief, for each suffix of a read, segemehl aims to find the best-scoring seed. Seeds might contain insertions, deletions, and mismatches (differences). The number of differences allowed within a single seed is user-controlled and is crucial for the runtime of the program. Subsequently, seeds that undercut the user-defined E-value are passed on to an exact semi-global alignment procedure. Finally, reads with a minimum accuracy of percent are reported to the user.
OPTIONS
INPUT -d, --database <file> [<file>] list of path/filename(s) of fasta database sequence(s) -q, --query <file> path/filename of query sequences (default:none) -p, --mate <file> path/filename of mate pair sequences (default:none) -i, --index <file> path/filename of db index (default:none) -j, --index2 <file> path/filename of second db index (default:none) -x, --generate <file> generate db index and store to disk (default:none) -y, --generate2 <file> generate second db index and store to disk (default:none) -G, --readgroupfile <file> filename to read @RG header (default:none) -g, --readgroupid <string> read group id (default:none) -t, --threads <n> start <n> threads (default:1) OUTPUT -o, --outfile <string> outputfile (default:none) -b, --bamabafixoida generate a bam output (-o <filename> required) -u, --nomatchfilename <file> filename for unmatched reads (default:none) -e, --briefcigar brief cigar string (M vs X and =) -s, --progressbar show a progress bar -B, --filebins <string> file bins with basename <string> for easier data handling (default:none) -V, --MEOP output MEOP field for easier variance calling in SAM (XE:Z:) ALIGNMENT -F, --bisulfite <n> bisulfite aln with methylC-seq/Lister et al. (=1) or bs-seq/Cokus et al. protocol (=2) (default:0) -S, --splits [<basename>] detect split/spliced reads. (default:none) -A, --accuracy <n> min percentage of matches per read in semi-global alignment (default:90) -D, --differences <n> search seeds initially with <n> differences (default:1) -E, --evalue <double> max evalue (default:5.000000) -H, --hitstrategy report only best scoring hits (=1) or all (=0) (default:1) -m, --minsize <n> minimum length of queries (default:12) -Z, --minfraglen <n> min length of a spliced fragment (default:20) -W, --minsplicecover <n> min coverage for spliced transcripts (default:80) -U, --minfragscore <n> min score of a spliced fragment (default:18) -l, --splicescorescale <f> report spliced alignment with score s only if <f>*s is larger than next best spliced alignment (default:0.900000) -w, --maxsplitevalue <double> max evalue for splits (default:50.000000) SPECIAL -X, --dropoff <n> dropoff parameter for extension (default:8) -J, --jump <n> search seeds with jump size <n> (0=automatic) (default:0) -O, --order sorts the output by chromsome and position (might take a while!) -I, --maxpairinsertsize <n> maximum size of the inserts (paired end) in case of multiple hits (default:200000) -M, --maxinterval <n> maximum width of a suffix array interval, i.e. a query seed will be omitted if it matches more than <n> times (default:100) -c, --checkidx check index -n, --extensionpenalty <n> penalty for a mismatch during extension (default:4) -r, --maxout <n> maximum number of alignments that will be reported. If set to zero, all alignments will be reported (default:0) --skipidcheck do not check whether the fastq ids of mates / paired ends match. Instead, the first mate (-q) will be used for output only. --showalign show alignments --nohead do not output header
BUGS
Please report bugs to steve@bioinf.uni-leipzig.de
SEE ALSO
http://www.bioinf.uni-leipzig.de/Software/segemehl/
REFERENCES
2008 Bioinformatik Leipzig 2018 Leibniz Institute on Aging (FLI)
AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.