Provided by: phybin_0.3-5build1_amd64
NAME
phybin - binning/clustering newick trees by topology
SYNOPSIS
phybin [OPTION...] files or directories...
DESCRIPTION
PhyBin takes Newick tree files as input. Paths of Newick files can be passed directly on the command line. Or, if directories are provided, all files in those directories will be read. Taxa are named based on the files containing them. If a file contains multiple trees, all are read by phybin, and the taxa name then includes a suffix indicating the position in the file: e.g. FILENAME_0, FILENAME_1, etc. When clustering trees, Phybin computes a complete all-to-all Robinson-Foulds distance matrix. If a threshold distance (tree edit distance) is given, then a flat set of clusters will be produced in files clusterXX_YY.tr. Otherwise it produces a full dendogram. Binning mode provides an especially quick-and-dirty form of clustering. When running with the --bin option, only exactly equal trees are put in the same cluster. Tree pre-processing still applies, however: for example collapsing short branches. USAGE NOTES: * Currently phybin ignores input trees with the wrong number of taxa. * If given a directory as input phybin will assume all contained files are Newick trees.
OPTIONS
-v --verbose print WARNINGS and other information (recommended at first) -V --version show version number -o DIR --output=DIR set directory to contain all output files (default "./phybin_out/") --selftest run internal unit tests Clustering Options --bin Use simple binning, the cheapest form of 'clustering' --single Use single-linkage clustering (nearest neighbor) --complete Use complete-linkage clustering (furthest neighbor) --UPGMA Use Unweighted Pair Group Method (average linkage) - DEFAULT mode --editdist=DIST Combine all clusters separated by DIST or less. Report a flat list of clusters. Irrespective of whether this is activated, a hierarchical clustering (dendogram.pdf) is produced. Select Robinson-Foulds (symmetric difference) distance algorithm: --hashrf (default) use a variant of the HashRF algorithm for the distance matrix --tolerant use a slower, modified RF metric that tolerates missing taxa Visualization -g --graphbins use graphviz to produce .dot and .pdf output files -d --drawbins like -g, but open GUI windows to show each bin's tree -w --view for convenience, "view mode" simply displays input Newick files without binning --showtrees Print (textual) tree topology inside the nodes of the dendrogram --highlight=FILE Highlight nodes in the tree-of-trees (dendrogram) consistent with the. given tree file. Multiple highlights are permitted and use different colors. --interior Show the consensus trees for interior nodes in the dendogram, rather than just points. Tree pre-processing --prune=TAXA Prune trees to only TAXA before doing anything else. Space and comma separated lists of taxa are allowed. Use quotes. -b LEN --minbranchlen=LEN collapse branches less than LEN --minbootstrap=INT collapse branches with bootstrap values less than INT Extracting taxa names -p NUM --nameprefix=NUM Leaf names in the input Newick trees can be gene names, not taxa. Then it is typical to extract taxa names from genes. This option extracts a prefix of NUM characters to serve as the taxa name. -s STR --namesep=STR An alternative to --nameprefix, STR provides a set of delimeter characters, for example '-' or '0123456789'. The taxa name is then a variable-length prefix of each gene name up to but not including any character in STR. -m FILE --namemap=FILE Even once prefixes are extracted it may be necessary to use a lookup table to compute taxa names, e.g. if multiple genes/plasmids map onto one taxa. This option specifies a text file with find/replace entries of the form "<string> <taxaname>", which are applied AFTER -s and -p. Utility Modes --rfdist print a Robinson Foulds distance matrix for the input trees --setdiff for convenience, print the set difference between cluster*.txt files --print simply print out a concise form of each input tree --printnorms simply print out a concise and NORMALIZED form of each input tree --consensus print a strict consensus tree for the inputs, then exit --matching print a list of tree names that match any --highlight argument
AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.