lunar (1) mash-triangle.1.gz

Provided by: mash_2.3+dfsg-3build1_amd64 bug

NAME

       mash-triangle - estimate a lower-triangular distance matrix

SYNOPSIS

       mash triangle [options] <seq1> [<seq2>] ...

DESCRIPTION

       Estimate the distance of each input sequence to every other input sequence. Outputs a
       lower-triangular distance matrix in relaxed Phylip format. The input sequences can be
       fasta or fastq, gzipped or not, or Mash sketch files (.msh) with matching k-mer sizes.
       Input files can also be files of file names (see -l). If more than one input file is
       provided, whole files are compared by default (see -i).

OPTIONS

       -h
           Help

       -p <int>
           Parallelism. This many threads will be spawned for processing. [1]

   Input
       -l
           List input. Each query file contains a list of sequence files, one per line. The
           reference file is not affected.

   Output
       -C
           Use comment fields for sequence names instead of IDs.

       -E
           Output edge list instead of Phylip matrix, with fields [seq1, seq2, dist, p-val,
           shared-hashes].

       -v <num>
           Maximum p-value to report in edge list. Implies -E. (0-1) [1.0]

       -d <num>
           Maximum distance to report in edge list. Implies -E. (0-1) [1.0]

   Sketching
       -k <int>
           K-mer size. Hashes will be based on strings of this many nucleotides. Canonical
           nucleotides are used by default (see Alphabet options below). (1-32) [21]

       -s <int>
           Sketch size. Each sketch will have at most this many non-redundant min-hashes. [1000]

       -i
           Sketch individual sequences, rather than whole files, e.g. for multi-fastas of
           single-chromosome genomes or pair-wise gene comparisons.

       -w <num>
           Probability threshold for warning about low k-mer size. (0-1) [0.01]

       -r
           Input is a read set. See Reads options below. Incompatible with -i.

   Sketching (reads)
       -b <size>
           Use a Bloom filter of this size (raw bytes or with K/M/G/T) to filter out unique
           k-mers. This is useful if exact filtering with -m uses too much memory. However, some
           unique k-mers may pass erroneously, and copies cannot be counted beyond 2. Implies -r.

       -m <int>
           Minimum copies of each k-mer required to pass noise filter for reads. Implies -r. [1]

       -c <num>
           Target coverage. Sketching will conclude if this coverage is reached before the end of
           the input file (estimated by average k-mer multiplicity). Implies -r.

       -g <size>
           Genome size. If specified, will be used for p-value calculation instead of an
           estimated size from k-mer content. Implies -r.

   Sketching (alphabet)
       -n
           Preserve strand (by default, strand is ignored by using canonical DNA k-mers, which
           are alphabetical minima of forward-reverse pairs). Implied if an alphabet is specified
           with -a or -z.

       -a
           Use amino acid alphabet (A-Z, except BJOUXZ). Implies -n, -k 9.

       -z <text>
           Alphabet to base hashes on (case ignored by default; see -Z). K-mers with other
           characters will be ignored. Implies -n.

       -Z
           Preserve case in k-mers and alphabet (case is ignored by default). Sequence letters
           whose case is not in the current alphabet will be skipped when sketching.

SEE ALSO

       mash(1)

                                            2019-12-13                           MASH-TRIANGLE(1)