Ubuntu Manpage: sigma - Simple greedy multiple alignment of non-coding DNA sequences

NAME

       sigma - Simple greedy multiple alignment of non-coding DNA sequences

SYNOPSIS


       sigma [options] [inputfile.fasta] [inputfile2.fasta ...]

       Each fasta file may contain a single sequence or multiple sequences; all sequences will be aligned
       together.

DESCRIPTION

       Sigma ("Simple greedy multiple alignment") is an alignment program with a new algorithm and scoring
       scheme designed specifically for non-coding DNA sequence. It uses a strategy of seeking the best possible
       gapless local alignments, at each step making the best possible alignment consistent with existing
       alignments, and scores the significance of the alignment based on the lengths of the aligned fragments
       and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. With
       real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs
       motif finder on pre-aligned sequence suggests that Sigma's alignments are superior.

OPTIONS

       -A --aligned_output
          Aligned, pretty-printed output (compare with -F option) (default: only this). See also -C.

       -b --bgprobfile filename
          Auxiliary file (in fasta format) from which to read background sequences (overridden by -B). Typically
          this is a file containing large quantities of similar non-coding sequence, from which background
          probabilities of single- and di-nucleotides may be estimated.

       -B --bgseqfile filename
          File containing background probabilities. The format is described further below.

       -C --caps_only
          Use only upper-case letters in output sequence, for compatibility with output of some other programs
          like ClustalW and MLagan. By default, output is mixed-case (as in Dialign), and lower-case bases are
          treated as not aligned.

       -F --fasta_output
          Multi-fasta output (can use both -A and -F in either order). See also -C.

       -n --ncorrel number
          Background correlation (default 2=dinucleotide; 1=single-site basecounts, 0=0.25 per base).

       -x, --significance number
          Set limit for how probable the match is by chance (default 0.002, smaller=more stringent).

       -h, --help
          Displays this list of options.

MORE HELP

       The "significance" parameter (-x) determines whether local alignments are accepted or rejected. The
       default at present is 0.002. Experiments on synthetic data (described in the paper) suggest that 0.002 is
       about the threshold where sigma fails to align phylogenetically-unrelated data that has moderate
       (yeast-like) dinucleotide correlation.

       Using a “background model” appropriate to the sequences being aligned greatly reduces spurious alignments
       on synthetic data (and, one hopes, on real data too). The simplest way to ensure this is to supply, via
       the -b parameter, a FASTA-format file containing large quantities of similar sequence data (eg, if one is
       aligning yeast sequences, supply a file containing all intergenic yeast sequence).

       Instead of this, if the single-site and dinucleotide frequencies are known already, they may be supplied
       in a file via the -B option. The file format should be: one entry per line, with the mononucleotide or
       dinucleotide (case-insensitive) followed by the frequency. (eg, "A 0.3", "AT 0.16", etc on successive
       lines.) A sample file is in the "Background" subdirectory of the source distribution (on Debian systems,
       this file can be found in the /usr/share/doc/sigma-align/Background directory). A file like
       "yeast.nc.3.freq" in the "tests" subdirectory of the MEME source distribution works fine (trinucleotide
       counts are ignored).

REFERENCE

       Please cite Sigma: Rahul Siddharthan (2006) Multiple alignment of weakly-conserved non-coding DNA
       sequence BMC Bioinformatics 2006, 7:143 doi:10.1186/1471-2105-7-143 Published 16 March 2006, available
       online at http://www.biomedcentral.com/1471-2105/7/143/

AUTHORS

       Rahul Siddharthan <rsidd@imsc.res.in>
          Wrote  sigma.  If  you're  using  Sigma for actual research, please let the author know so that he can
          alert you of bugfixes or new releases.

       Charles Plessy <charles-debian-nospam@plessy.org>
          Wrote the manpage in DocBook XML for the Debian distribution.

COPYRIGHT

       Copyright © 2006-2007 Rahul Siddharthan
       Copyright © 2006-2007 Charles Plessy

       Sigma is free software. You can redistribute it and/or modify it under  the  terms  of  the  GNU  General
       Public License as published by the Free Software Foundation.

       On   Debian   systems,   the   complete  text  of  the  GNU  General  Public  License  can  be  found  in
       /usr/share/common-licenses/GPL.

sigma 1.1                                          2007-04-07                                           SIGMA(1)