Ubuntu Manpage: mummer - package for sequence alignment of multiple genomes

NAME

       mummer - package for sequence alignment of multiple genomes

SYNOPSIS

       mummer-annotate <gapfile><datafile>
       combineMUMs <RefSequence><MatchSequences><GapsFile>
       delta-filter [options]<deltafile>
       dnadiff [options]<reference><query> or [options]-d<deltafile>
       exact-tandems <file><min-match-len>
       gaps
       mapview [options]<coordsfile>[UTRcoords][CDScoords]
       mgaps [-d<DiagDiff>][-f<DiagFactor>][-l<MatchLen>][-s<MaxSeparation>]
       mummer [options]<reference-file><query-files>
       mummerplot [options]<matchfile>
       nucmer [options]<Reference><Query>
       nucmer2xfig
       promer [options]<Reference><Query>
       repeat-match [options]<genome-file>
       run-mummer1 <fastareference><fastaquery><prefix>[-r]
       run-mummer3 <fastareference><multi-fastaquery><prefix>
       show-aligns [options]<deltafile><refID><qryID>

       Input is the .delta output of either the "nucmer" or the "promer" program passed on the command line.

       Output  is  to  stdout,  and  consists  of  all  the alignments between the query and reference sequences
       identified on the command line.

       NOTE: No sorting is done by default, therefore the alignments will be ordered as found in the <deltafile>
       input.
       show-coords [options]<deltafile>
       show-snps [options]<deltafile>
       show-tiling [options]<deltafile>

DESCRIPTION

OPTIONS

       All tools (exept for gaps) obey to the -h, --help, -V and --version options as  one  would  expect.  This
       help is excellent and makes these man pages basically obsolete.
       combineMUMs Combines MUMs in <GapsFile> by extending matches off ends and between MUMs.  <RefSequence> is
       a  fasta file of the reference sequence.  <MatchSequences> is a multi-fasta file of the sequences matched
       against the reference

         -D      Only output to stdout the difference positions
                 and characters
         -n      Allow matches only between nucleotides, i.e., ACGTs
         -N num  Break matches at <num> or more consecutive non-ACGTs
         -q tag  Used to label query match
         -r tag  Used to label reference match
         -S      Output all differences in strings
         -t      Label query matches with query fasta header
         -v num  Set verbose level for extra output
         -W file Reset the default output filename witherrors.gaps
         -x      Don't output .cover files
         -e      Set error-rate cutoff to e (e.g. 0.02 is two percent)
       dnadiff Run comparative analysis of two sequence sets using nucmer  and  its  associated  utilities  with
       recommended  parameters. See MUMmer documentation for a more detailed description of the output. Produces
       the following output files:

           .report  - Summary of alignments, differences and SNPs
           .delta   - Standard nucmer alignment output
           .1delta  - 1-to-1 alignment from delta-filter -1
           .mdelta  - M-to-M alignment from delta-filter -m
           .1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
           .mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
           .snps    - SNPs from show-snps -rlTHC .1delta
           .rdiff   - Classified ref breakpoints from show-diff -rH .mdelta
           .qdiff   - Classified qry breakpoints from show-diff -qH .mdelta
           .unref   - Unaligned reference IDs and lengths (if applicable)
           .unqry   - Unaligned query IDs and lengths (if applicable)

       MANDATORY:
           reference       Set the input reference multi-FASTA filename
           query           Set the input query multi-FASTA filename
             or
           delta file      Unfiltered .delta alignment file from nucmer

       OPTIONS:
           -d|delta        Provide precomputed delta file for analysis
           -h
           --help          Display help information and exit
           -p|prefix       Set the prefix of the output files (default "out")
           -V
           --version       Display the version information and exit

       delta-filter
         -e float    For switches -g -r -q, keep repeats within e percent
                     of the best LIS score [0, 100], no repeats by default
         -g          Global alignment using length*identity weighted LIS.
                     For every reference-query pair, leave only the aligns
                     which form the longest mutually consistent set
         -h          Display help information
         -i float    Set the minimum alignment identity [0, 100], default 0
         -l int      Set the minimum alignment length, default 0
         -q          Query alignment using length*identity weighted LIS.
                     For each query, leave only the aligns which form the
                     longest consistent set for the query
         -r          Reference alignment using length*identity weighted LIS.
                     For each reference, leave only the aligns which form
                     the longest consistent set for the reference
         -u float    Set the minimum alignment uniqueness, i.e. percent of
                     the alignment matching to unique reference AND query
                     sequence [0, 100], default 0
         -o float    Set the maximum alignment overlap for -r and -q options
                     as a percent of the alignment length [0, 100], default 100

         Reads a delta alignment file from either nucmer or promer and  filters  the  alignments  based  on  the
       command-line  switches,  leaving only the desired alignments which are output to stdout in the same delta
       format as the input. For multiple switches, order of operations is as follows: -i -l -u -q -r -g.  If  an
       alignment is excluded by a preceding operation, it will be ignored by the succeeding operations

         An important distinction between the -g option and the -r -q options is that -g requires the alignments
       to  be  mutually  consistent  in  their  order,  while  the -r -q options are not required to be mutually
       consistent and therefore tolerate translocations, inversions, etc. Thus, -r provides a one-to-many, -q  a
       many-to-one,  -r -q a one-to-one local mapping, and -g a one-to-one global mapping of reference and query
       bases respectively.
       mapview
         -h
         --help   Display help information and exit
         -m|mag   Set the magnification at which the figure is rendered,
                  this is an option for fig2dev which is used to generate
                  the PDF and PS files (default 1.0)
         -n|num   Set the number of output files used to partition the
                  output, this is to avoid generating files that are too
                  large to display (default 10)
         -p|prefix  Set the output file prefix
                  (default "PROMER_graph or NUCMER_graph")
         -v
         --verbose  Verbose logging of the processed files
         -V
         --version  Display the version information and exit
         -x1 coord  Set the lower coordinate bound of the display
         -x2 coord  Set the upper coordinate bound of the display
         -g|ref     If the input file is provided by 'mgaps', set the
                    reference sequence ID (as it appears in the first column
                    of the UTR/CDS coords file)
         -I         Display the name of query sequences
         -Ir        Display the name of reference genes
       mummer Find and output (to stdout) the positions and length of all sufficiently long maximal matches of a
       substring in <query-file> and <reference-file>

         -mum           compute maximal matches that are unique in both sequences
         -mumcand       same as -mumreference
         -mumreference  compute maximal matches that are unique in
                   the reference-sequence but not necessarily            in the query-sequence (default)
         -maxmatch      compute all maximal matches regardless of their uniqueness
         -n             match only the characters a, c, g, or t
                        they can be in upper or in lower case
         -l             set the minimum length of a match
                        if not set, the default value is 20
         -b             compute forward and reverse complement matches
         -r             only compute reverse complement matches
         -s             show the matching substrings
         -c             report the query-position of a reverse complement match
                        relative to the original query sequence
         -F             force 4 column output format regardless of the number of
                        reference sequence inputs
         -L             show the length of the query sequences on the header line
       nuncmer
           nucmer generates nucleotide alignments between two mutli-FASTA input
           files. Two output files are generated. The .cluster output file lists
           clusters of matches between each sequence. The .delta file lists the
           distance between insertions and deletions that produce maximal scoring
           alignments between each sequence.

       MANDATORY:
           Reference     Set the input reference multi-FASTA filename
           Query         Set the input query multi-FASTA filename

         --mum           Use anchor matches that are unique in both the reference
                         and query
         --mumcand       Same as --mumreference
         --mumreference  Use anchor matches that are unique in in the reference
                         but not necessarily unique in the query (default behavior)
         --maxmatch      Use all anchor matches regardless of their uniqueness

         -b|breaklen     Set the distance an alignment extension will attempt to
                         extend poor scoring regions before giving up (default 200)
         -c|mincluster   Sets the minimum length of a cluster of matches (default 65)
         --[no]delta     Toggle the creation of the delta file (default --delta)
         --depend        Print the dependency information and exit
         -d|diagfactor   Set the clustering diagonal difference separation factor
                         (default 0.12)
         --[no]extend    Toggle the cluster extension step (default --extend)
         -f
         --forward       Use only the forward strand of the Query sequences
         -g|maxgap       Set the maximum gap between two adjacent matches in a
                         cluster (default 90)
         -h
         --help          Display help information and exit
         -l|minmatch     Set the minimum length of a single match (default 20)
         -o
         --coords        Automatically generate the original NUCmer1.1 coords
                         output file using the 'show-coords' program
         --[no]optimize  Toggle alignment score optimization, i.e. if an alignment
                         extension reaches the end of a sequence, it will backtrack
                         to optimize the alignment score instead of terminating the
                         alignment at the end of the sequence (default --optimize)
         -p|prefix       Set the prefix of the output files (default "out")
         -r
         --reverse       Use only the reverse complement of the Query sequences
         --[no]simplify  Simplify alignments by removing shadowed clusters. Turn
                         this option off if aligning a sequence to itself to look
                         for repeats (default --simplify)

       promer
           promer generates amino acid alignments between two mutli-FASTA DNA input
           files. Two output files are generated. The .cluster output file lists
           clusters of matches between each sequence. The .delta file lists the
           distance between insertions and deletions that produce maximal scoring
           alignments between each sequence. The DNA input is translated into all 6
           reading frames in order to generate the output, but the output coordinates
           reference the original DNA input.

       MANDATORY:
           Reference     Set the input reference multi-FASTA DNA file
           Query         Set the input query multi-FASTA DNA file

         --mum           Use anchor matches that are unique in both the reference
                         and query
         --mumcand       Same as --mumreference
         --mumreference  Use anchor matches that are unique in in the reference
                         but not necessarily unique in the query (default behavior)
         --maxmatch      Use all anchor matches regardless of their uniqueness

         -b|breaklen     Set the distance an alignment extension will attempt to
                         extend poor scoring regions before giving up, measured in
                         amino acids (default 60)
         -c|mincluster   Sets the minimum length of a cluster of matches, measured in
                         amino acids (default 20)
         --[no]delta     Toggle the creation of the delta file (default --delta)
         --depend        Print the dependency information and exit
         -d|diagfactor   Set the clustering diagonal difference separation factor
                         (default .11)
         --[no]extend    Toggle the cluster extension step (default --extend)
         -g|maxgap       Set the maximum gap between two adjacent matches in a
                         cluster, measured in amino acids (default 30)
         -l|minmatch     Set the minimum length of a single match, measured in amino
                         acids (default 6)
         -m|masklen      Set the maximum bookend masking lenth, measured in amino
                         acids (default 8)
         -o
         --coords        Automatically generate the original PROmer1.1 ".coords"
                         output file using the "show-coords" program
         --[no]optimize  Toggle alignment score optimization, i.e. if an alignment
                         extension reaches the end of a sequence, it will backtrack
                         to optimize the alignment score instead of terminating the
                         alignment at the end of the sequence (default --optimize)

         -p|prefix       Set the prefix of the output files (default "out")
         -x|matrix       Set the alignment matrix number to 1 [BLOSUM 45],
                         2 [BLOSUM 62] or 3 [BLOSUM 80] (default 2)
       repeat-match Find all maximal exact matches in <genome-file>
         -E    Use exhaustive (slow) search to find matches
         -f    Forward strand only, don't use reverse complement
         -n #  Set minimum exact match length to #
         -t    Only output tandem repeats
         -V #  Set level of verbose (debugging) printing to #
       show-aligns
         -h      Display help information
         -q      Sort alignments by the query start coordinate
         -r      Sort alignments by the reference start coordinate
         -w int  Set the screen width - default is 60
         -x int  Set the matrix type - default is 2 (BLOSUM 62),
                 other options include 1 (BLOSUM 45) and 3 (BLOSUM 80)
                 note: only has effect on amino acid alignments
       show-coords
         -b          Merges overlapping alignments regardless of match dir
                     or frame and does not display any idenitity information.
         -B          Switch output to btab format
         -c          Include percent coverage information in the output
         -d          Display the alignment direction in the additional
                     FRM columns (default for promer)
         -g          Deprecated option. Please use 'delta-filter' instead
         -h          Display help information
         -H          Do not print the output header
         -I float    Set minimum percent identity to display
         -k          Knockout (do not display) alignments that overlap
                     another alignment in a different frame by more than 50%
                     of their length, AND have a smaller percent similarity
                     or are less than 75% of the size of the other alignment
                     (promer only)
         -l          Include the sequence length information in the output
         -L long     Set minimum alignment length to display
         -o          Annotate maximal alignments between two sequences, i.e.
                     overlaps between reference and query sequences
         -q          Sort output lines by query IDs and coordinates
         -r          Sort output lines by reference IDs and coordinates
         -T          Switch output to tab-delimited format

         Input is the .delta output of either the "nucmer" or the "promer" program passed on the command line.

         Output is to stdout, and consists of  a  list  of  coordinates,  percent  identity,  and  other  useful
       information regarding the alignment data contained in the .delta file used as input.

         NOTE:  No  sorting  is  done  by  default,  therefore  the  alignments  will be ordered as found in the
       <deltafile> input.
       show-snps
         -C            Do not report SNPs from alignments with an ambiguous
                       mapping, i.e. only report SNPs where the [R] and [Q]
                       columns equal 0 and do not output these columns
         -h            Display help information
         -H            Do not print the output header
         -I            Do not report indels
         -l            Include sequence length information in the output
         -q            Sort output lines by query IDs and SNP positions
         -r            Sort output lines by reference IDs and SNP positions
         -S            Specify which alignments to report by passing
                       'show-coords' lines to stdin
         -T            Switch to tab-delimited format
         -x int        Include x characters of surrounding SNP context in the
                       output, default 0

         Input is the .delta output of either the nucmer or promer program passed on the command line.

         Output is to stdout, and consists of a list of SNPs (or  amino  acid  substitutions  for  promer)  with
       positions  and  other  useful  info.  Output will be sorted with -r by default and the [BUFF] column will
       always refer to the sequence whose positions have been sorted. This value  specifies  the  distance  from
       this  SNP  to  the  nearest mismatch (end of alignment, indel, SNP, etc) in the same alignment, while the
       [DIST] column specifies the distance from this SNP to the nearest sequence end. SNPs for  which  the  [R]
       and  [Q] columns are greater than 0 should be evaluated with caution, as these columns specify the number
       of other alignments which overlap this position. Use -C to assure SNPs  are  only  reported  from  unique
       alignment regions.

       show-tiling
         -a          Describe the tiling path by printing the tab-delimited
                     alignment region coordinates to stdout
         -c          Assume the reference sequences are circular, and allow
                     tiled contigs to span the origin
         -g int      Set maximum gap between clustered alignments [-1, INT_MAX]
                     A value of -1 will represent infinity
                     (nucmer default = 1000)
                     (promer default = -1)
         -i float    Set minimum percent identity to tile [0.0, 100.0]
                     (nucmer default = 90.0)
                     (promer default = 55.0)
         -l int      Set minimum length contig to report [-1, INT_MAX]
                     A value of -1 will represent infinity
                     (common default = 1)
         -p file     Output a pseudo molecule of the query contigs to 'file'
         -R          Deal with repetitive contigs by randomly placing them
                     in one of their copy locations (implies -V 0)
         -t file     Output a TIGR style contig list of each query sequence
                     that sufficiently matches the reference (non-circular)
         -u file     Output the tab-delimited alignment region coordinates
                     of the unusable contigs to 'file'
         -v float    Set minimum contig coverage to tile [0.0, 100.0]
                     (nucmer default = 95.0) sum of individual alignments
                     (promer default = 50.0) extent of syntenic region
         -V float    Set minimum contig coverage difference [0.0, 100.0]
                     i.e. the difference needed to determine one alignment
                     is 'better' than another alignment
                     (nucmer default = 10.0) sum of individual alignments
                     (promer default = 30.0) extent of syntenic region
         -x          Describe the tiling path by printing the XML contig
                     linking information to stdout

         Input  is  the  .delta  output  of the nucmer program, run on very similar sequence data, or the .delta
       output of the promer program, run on divergent sequence data.

         Output is to stdout, and consists of the predicted location of each aligning query contig as mapped  to
       the  reference  sequences.   These coordinates reference the extent of the entire query contig, even when
       only a certain percentage of the contig was actually aligned (unless the -a option is used). Columns are,
       start in ref, end in ref, distance to next contig, length of this contig, alignment  coverage,  identity,
       orientation, and ID respectively.

AUTHOR

       mummer  was written by S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L.
       Salzberg.

                                                  May 21, 2005                                         MUMMER(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

SEE ALSO

AUTHOR