Ubuntu Manpage: segemehl - Heuristic mapping of short sequences

NAME

       segemehl - Heuristic mapping of short sequences

SYNOPSIS

       segemehl  [-besVOc] -d <file> [<file>] [-q <file>] [-p <file>] [-i <file>] [-j <file>] [-x
       <file>] [-y <file>] [-G <file>] [-g <string>] [-t  <n>]  [-o  <string>]  [-u  <file>]  [-B
       <string>]  [-F  <n>]  [-S  [<basename>]] [-A <n>] [-D <n>] [-E <double>] [-H] [-m <n>] [-Z
       <n>] [-W <n>] [-U <n>] [-l <f>] [-w <double>] [-X <n>] [-J <n>] [-I <n>] [-M <n>] [-n <n>]
       [-r <n>] [--skipidcheck] [--showalign] [--nohead]

DESCRIPTION

       Segemehl  is  a  software  to  map  short  sequencer  reads to reference genomes. Segemehl
       implements a matching strategy based on enhanced suffix  arrays  (ESA).  Segemehl  accepts
       fasta  and  fastq  queries (gzipâ€™ed and bgzip'ed). In addition to the alignment of reads
       from standard DNA- and  RNA-seq  protocols,  it  also  allows  the  mapping  of  bisulfite
       converted  reads  (Lister  and  Cokus)  and implements a split read mapping strategy.  The
       output of segemehl is a SAM or BAM formatted alignment file. In  the  case  of  split-read
       mapping,  additional  BED files are written to the disc. These BED files may be summarized
       with the postprocessing tool haarz. In the case of the alignment  of  bisulfite  converted
       reads, raw methylation rates may also be called with haarz.

       In  brief,  for  each suffix of a read, segemehl aims to find the best-scoring seed. Seeds
       might  contain  insertions,  deletions,  and  mismatches  (differences).  The  number   of
       differences allowed within a single seed is user-controlled and is crucial for the runtime
       of the program.  Subsequently, seeds that undercut the user-defined E-value are passed  on
       to  an  exact  semi-global  alignment procedure. Finally, reads with a minimum accuracy of
       percent are reported to the user.

OPTIONS

   INPUT
       -d, --database <file> [<file>]
              list of path/filename(s) of fasta database sequence(s)

       -q, --query <file>
              path/filename of query sequences (default:none)

       -p, --mate <file>
              path/filename of mate pair sequences (default:none)

       -i, --index <file>
              path/filename of db index (default:none)

       -j, --index2 <file>
              path/filename of second db index (default:none)

       -x, --generate <file>
              generate db index and store to disk (default:none)

       -y, --generate2 <file>
              generate second db index and store to disk (default:none)

       -G, --readgroupfile <file>
              filename to read @RG header (default:none)

       -g, --readgroupid <string>
              read group id (default:none)

       -t, --threads <n>
              start <n> threads (default:1)

   OUTPUT
       -o, --outfile <string>
              outputfile (default:none)

       -b, --bamabafixoida
              generate a bam output (-o <filename> required)

       -u, --nomatchfilename <file>
              filename for unmatched reads (default:none)

       -e, --briefcigar
              brief cigar string (M vs X and =)

       -s, --progressbar
              show a progress bar

       -B, --filebins <string>
              file bins with basename <string> for easier data handling (default:none)

       -V, --MEOP
              output MEOP field for easier variance calling in SAM (XE:Z:)

   ALIGNMENT
       -F, --bisulfite <n>
              bisulfite aln with methylC-seq/Lister et al. (=1) or bs-seq/Cokus et  al.  protocol
              (=2) (default:0)

       -S, --splits [<basename>]
              detect split/spliced reads. (default:none)

       -A, --accuracy <n>
              min percentage of matches per read in semi-global alignment (default:90)

       -D, --differences <n>
              search seeds initially with <n> differences (default:1)

       -E, --evalue <double>
              max evalue (default:5.000000)

       -H, --hitstrategy
              report only best scoring hits (=1) or all (=0) (default:1)

       -m, --minsize <n>
              minimum length of queries (default:12)

       -Z, --minfraglen <n>
              min length of a spliced fragment (default:20)

       -W, --minsplicecover <n>
              min coverage for spliced transcripts (default:80)

       -U, --minfragscore <n>
              min score of a spliced fragment (default:18)

       -l, --splicescorescale <f>
              report  spliced  alignment  with  score  s  only  if <f>*s is larger than next best
              spliced alignment (default:0.900000)

       -w, --maxsplitevalue <double>
              max evalue for splits (default:50.000000)

   SPECIAL
       -X, --dropoff <n>
              dropoff parameter for extension (default:8)

       -J, --jump <n>
              search seeds with jump size <n> (0=automatic) (default:0)

       -O, --order
              sorts the output by chromsome and position (might take a while!)

       -I, --maxpairinsertsize <n>
              maximum size of the inserts (paired end) in case of multiple hits (default:200000)

       -M, --maxinterval <n>
              maximum width of a suffix array interval, i.e. a query seed will be omitted  if  it
              matches more than <n> times (default:100)

       -c, --checkidx
              check index

       -n, --extensionpenalty <n>
              penalty for a mismatch during extension (default:4)

       -r, --maxout <n>
              maximum  number of alignments that will be reported. If set to zero, all alignments
              will be reported (default:0)

       --skipidcheck
              do not check whether the fastq ids of mates / paired ends match. Instead, the first
              mate (-q) will be used for output only.

       --showalign
              show alignments

       --nohead
              do not output header

BUGS

       Please report bugs to steve@bioinf.uni-leipzig.de

REFERENCES

              2008 Bioinformatik Leipzig

              2018 Leibniz Institute on Aging (FLI)

AUTHOR

       This  manpage was written by Andreas Tille for the Debian distribution and can be used for
       any other usage of the program.

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

BUGS

SEE ALSO

REFERENCES

AUTHOR