Ubuntu Manpage: spaced - alignment-free sequence comparison

name
synopsis
description
output
options
copyright
references
bugs

NAME

       spaced - alignment-free sequence comparison

SYNOPSIS

       spaced [-r] [-k INT] [-l INT] [-n INT] [-t INT] [-d TYPE] [-f FILE] FILES...

DESCRIPTION

       Spaced  Words  is  a  new  approach  to  alignment-free  sequence  comparison.  While most alignment-free
       algorithms compare the word-composition of sequences, Spaced Words uses a pattern of care and don't  care
       positions.  The  occurrence of a spaced word in a sequence is then defined by the characters at the match
       positions only, while the characters at the  don't  care  positions  are  ignored  (this  was  originally
       inspired  by  the  PatternHunter  algorithm  for  homology search in databases). Instead of comparing the
       frequencies of contiguous words in the input sequences, our new approach compares the frequencies of  the
       spaced words according to the pre-defined pattern. An information-theoretic distance measure is then used
       to define pairwise distances on the set of input sequences based on their  spaced-word  frequencies.  The
       original version of our spaced-words approach was published in Boden et al.(2013).

OUTPUT

       The  output  is  a  symmetrical  distance  matrix  similar to PHYLIP format, with each entry representing
       divergence with a positive real number. A distance of  zero  means  that  two  sequences  are  identical,
       whereas other values are estimates for the nucleotide substitution rate (Jukes-Cantor corrected).

OPTIONS

       -o <file>
              Print the distance matrix to the given file. Default is DMat.

       -k <int>
              Set the patterns weight. Default: 14.

       -l <int>
              Set don't care positions for the used patterns. Default: 15.

       -n <int>
              Set the number of patterns. Default: 5.

       -f <file>
              Instead of generating new patterns, use read them from the given file.

       -t <INT>
              The number of threads to be used; by default, 25 threads are used.
              Multithreading is only available if spaced was compiled with OpenMP support.

       -r     Skip comparison with the reverse complement.

       -d <type>
              The  distances  can  be  compute  with  different  measures. Available options are Euclidean (EU),
              Jensen-Shannon (JS), and evolutionary distance (EV). Default: EV.

       -h     Prints the synopsis and an explanation of available options.

COPYRIGHT

       Copyright © 2016 Chris  Leimeister  <chris.leimeister@stud.uni-goettingen.de>  License  GPLv3+:  GNU  GPL
       version 3 or later.
       This  is  free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent
       permitted by law.  The full license text is available at <http://gnu.org/licenses/gpl.html>.

REFERENCES

       1) C.-A. Leimeister, M. Boden, S.  Horwege,  S.  Lindner,  B.  Morgenstern  (2014).  Fast  alignment-free
       sequence         comparison         using         spaced-word         frequencies,         Bioinformatics
       <http://bioinformatics.oxfordjournals.org/content/early/2014/04/03/bioinformatics.btu177>
       2) S. Horwege, S. Linder, M. Boden, K. Hatje, M. Kollmar, C.-A. Leimeister, B. Morgenstern (2014). Spaced
       words  and  kmacs:  fast  alignment-free sequence comparison based on inexact word matches, Nucleic Acids
       Research 42, W7-W11 <http://nar.oxfordjournals.org/content/42/W1/W7.abstract>
       3) B. Morgenstern, B. Zhu, S. Horwege, C.-A Leimeister (2015). Estimating evolutionary distances  between
       genomic sequences from spaced-word matches, Algorithms for Molecular Biology 10,5

BUGS

   Reporting Bugs
       Please report bugs to <kloetzl@evolbio.mpg.de> or <chris.leimeister@stud.uni-goettingen.de>.