Ubuntu Manpage: segemehl - Heuristic mapping of short sequences

Provided by: segemehl_0.3.4-5build2_amd64

NAME

       segemehl - Heuristic mapping of short sequences

SYNOPSIS

       segemehl  [-besVOc]  -d  <file>  [<file>] [-q <file>] [-p <file>] [-i <file>] [-j <file>] [-x <file>] [-y
       <file>] [-G <file>] [-g <string>] [-t  <n>]  [-o  <string>]  [-u  <file>]  [-B  <string>]  [-F  <n>]  [-S
       [<basename>]]  [-A  <n>]  [-D  <n>]  [-E  <double>] [-H] [-m <n>] [-Z <n>] [-W <n>] [-U <n>] [-l <f>] [-w
       <double>] [-X <n>] [-J <n>] [-I <n>] [-M <n>] [-n <n>] [-r <n>] [--skipidcheck] [--showalign] [--nohead]

DESCRIPTION

       Segemehl is a software to map short sequencer reads to reference genomes. Segemehl implements a  matching
       strategy  based  on enhanced suffix arrays (ESA). Segemehl accepts fasta and fastq queries (gzipâ€™ed and
       bgzip'ed). In addition to the alignment of reads from standard DNA- and RNA-seq protocols, it also allows
       the mapping of bisulfite converted reads (Lister and Cokus) and implements a split read mapping strategy.
       The output of segemehl is a SAM or BAM formatted alignment file.  In  the  case  of  split-read  mapping,
       additional  BED  files are written to the disc. These BED files may be summarized with the postprocessing
       tool haarz. In the case of the alignment of bisulfite converted reads, raw methylation rates may also  be
       called with haarz.

       In  brief,  for  each  suffix of a read, segemehl aims to find the best-scoring seed. Seeds might contain
       insertions, deletions, and mismatches (differences). The number of differences allowed  within  a  single
       seed is user-controlled and is crucial for the runtime of the program.  Subsequently, seeds that undercut
       the user-defined E-value are passed on to an exact semi-global alignment procedure. Finally, reads with a
       minimum accuracy of percent are reported to the user.

OPTIONS

   INPUT
       -d, --database <file> [<file>]
              list of path/filename(s) of fasta database sequence(s)

       -q, --query <file>
              path/filename of query sequences (default:none)

       -p, --mate <file>
              path/filename of mate pair sequences (default:none)

       -i, --index <file>
              path/filename of db index (default:none)

       -j, --index2 <file>
              path/filename of second db index (default:none)

       -x, --generate <file>
              generate db index and store to disk (default:none)

       -y, --generate2 <file>
              generate second db index and store to disk (default:none)

       -G, --readgroupfile <file>
              filename to read @RG header (default:none)

       -g, --readgroupid <string>
              read group id (default:none)

       -t, --threads <n>
              start <n> threads (default:1)

   OUTPUT
       -o, --outfile <string>
              outputfile (default:none)

       -b, --bamabafixoida
              generate a bam output (-o <filename> required)

       -u, --nomatchfilename <file>
              filename for unmatched reads (default:none)

       -e, --briefcigar
              brief cigar string (M vs X and =)

       -s, --progressbar
              show a progress bar

       -B, --filebins <string>
              file bins with basename <string> for easier data handling (default:none)

       -V, --MEOP
              output MEOP field for easier variance calling in SAM (XE:Z:)

   ALIGNMENT
       -F, --bisulfite <n>
              bisulfite aln with methylC-seq/Lister et al. (=1) or bs-seq/Cokus et al. protocol (=2) (default:0)

       -S, --splits [<basename>]
              detect split/spliced reads. (default:none)

       -A, --accuracy <n>
              min percentage of matches per read in semi-global alignment (default:90)

       -D, --differences <n>
              search seeds initially with <n> differences (default:1)

       -E, --evalue <double>
              max evalue (default:5.000000)

       -H, --hitstrategy
              report only best scoring hits (=1) or all (=0) (default:1)

       -m, --minsize <n>
              minimum length of queries (default:12)

       -Z, --minfraglen <n>
              min length of a spliced fragment (default:20)

       -W, --minsplicecover <n>
              min coverage for spliced transcripts (default:80)

       -U, --minfragscore <n>
              min score of a spliced fragment (default:18)

       -l, --splicescorescale <f>
              report  spliced  alignment  with  score s only if <f>*s is larger than next best spliced alignment
              (default:0.900000)

       -w, --maxsplitevalue <double>
              max evalue for splits (default:50.000000)

   SPECIAL
       -X, --dropoff <n>
              dropoff parameter for extension (default:8)

       -J, --jump <n>
              search seeds with jump size <n> (0=automatic) (default:0)

       -O, --order
              sorts the output by chromsome and position (might take a while!)

       -I, --maxpairinsertsize <n>
              maximum size of the inserts (paired end) in case of multiple hits (default:200000)

       -M, --maxinterval <n>
              maximum width of a suffix array interval, i.e. a query seed will be omitted  if  it  matches  more
              than <n> times (default:100)

       -c, --checkidx
              check index

       -n, --extensionpenalty <n>
              penalty for a mismatch during extension (default:4)

       -r, --maxout <n>
              maximum  number  of  alignments  that  will  be  reported.  If set to zero, all alignments will be
              reported (default:0)

       --skipidcheck
              do not check whether the fastq ids of mates / paired ends match. Instead, the first mate (-q) will
              be used for output only.

       --showalign
              show alignments

       --nohead
              do not output header

BUGS

       Please report bugs to steve@bioinf.uni-leipzig.de

REFERENCES

              2008 Bioinformatik Leipzig

              2018 Leibniz Institute on Aging (FLI)

AUTHOR

       This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage
       of the program.

segemehl 0.3                                      October 2018                                       SEGEMEHL(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

BUGS

SEE ALSO

REFERENCES

AUTHOR