Provided by: mash_2.3+dfsg-1build2_amd64
NAME
mash-triangle - estimate a lower-triangular distance matrix
SYNOPSIS
mash triangle [options] <seq1> [<seq2>] ...
DESCRIPTION
Estimate the distance of each input sequence to every other input sequence. Outputs a lower-triangular distance matrix in relaxed Phylip format. The input sequences can be fasta or fastq, gzipped or not, or Mash sketch files (.msh) with matching k-mer sizes. Input files can also be files of file names (see -l). If more than one input file is provided, whole files are compared by default (see -i).
OPTIONS
-h Help -p <int> Parallelism. This many threads will be spawned for processing. [1] Input -l List input. Each query file contains a list of sequence files, one per line. The reference file is not affected. Output -C Use comment fields for sequence names instead of IDs. -E Output edge list instead of Phylip matrix, with fields [seq1, seq2, dist, p-val, shared-hashes]. -v <num> Maximum p-value to report in edge list. Implies -E. (0-1) [1.0] -d <num> Maximum distance to report in edge list. Implies -E. (0-1) [1.0] Sketching -k <int> K-mer size. Hashes will be based on strings of this many nucleotides. Canonical nucleotides are used by default (see Alphabet options below). (1-32) [21] -s <int> Sketch size. Each sketch will have at most this many non-redundant min-hashes. [1000] -i Sketch individual sequences, rather than whole files, e.g. for multi-fastas of single-chromosome genomes or pair-wise gene comparisons. -w <num> Probability threshold for warning about low k-mer size. (0-1) [0.01] -r Input is a read set. See Reads options below. Incompatible with -i. Sketching (reads) -b <size> Use a Bloom filter of this size (raw bytes or with K/M/G/T) to filter out unique k-mers. This is useful if exact filtering with -m uses too much memory. However, some unique k-mers may pass erroneously, and copies cannot be counted beyond 2. Implies -r. -m <int> Minimum copies of each k-mer required to pass noise filter for reads. Implies -r. [1] -c <num> Target coverage. Sketching will conclude if this coverage is reached before the end of the input file (estimated by average k-mer multiplicity). Implies -r. -g <size> Genome size. If specified, will be used for p-value calculation instead of an estimated size from k-mer content. Implies -r. Sketching (alphabet) -n Preserve strand (by default, strand is ignored by using canonical DNA k-mers, which are alphabetical minima of forward-reverse pairs). Implied if an alphabet is specified with -a or -z. -a Use amino acid alphabet (A-Z, except BJOUXZ). Implies -n, -k 9. -z <text> Alphabet to base hashes on (case ignored by default; see -Z). K-mers with other characters will be ignored. Implies -n. -Z Preserve case in k-mers and alphabet (case is ignored by default). Sequence letters whose case is not in the current alphabet will be skipped when sketching.
SEE ALSO
mash(1) 2019-12-13 MASH-TRIANGLE(1)