Ubuntu Manpage: fasttreeMP - create phylogenetic trees from alignments of nucleotide or protein sequences (openMP

NAME

       fasttreeMP  -  create  phylogenetic  trees  from  alignments  of  nucleotide or protein sequences (openMP
       version)

DESCRIPTION

       fasttreeMP infers approximately-maximum-likelihood phylogenetic trees from alignments  of  nucleotide  or
       protein sequences. It handles alignments with up to a million of sequences in a reasonable amount of time
       and memory.

       fasttreeMP is more accurate than PhyML 3 with default settings, and much more accurate than the distance-
       matrix  methods  that  are  traditionally  used for large alignments. fasttreeMP uses the Jukes-Cantor or
       generalized time-reversible (GTR) models of nucleotide evolution and the JTT (Jones-Taylor-Thornton 1992)
       model of amino acid evolution. To account for the varying rates of  evolution  across  sites,  fasttreeMP
       uses  a  single rate for each site (the "CAT" approximation). To quickly estimate the reliability of each
       split in the tree, fasttreeMP computes local support values with the Shimodaira-Hasegawa test (these  are
       the same as PhyML 3's "SH-like local supports").

SYNOPSIS

       fasttreeMP protein_alignment > tree

       fasttreeMP -nt nucleotide_alignment > tree

       fasttreeMP -nt -gtr < nucleotide_alignment > tree

       fasttreeMP accepts alignments in fasta or phylip interleaved formats

   Common options (must be before the alignment file):

       -quiet to suppress reporting information

       -nopr to suppress progress indicator

       -log logfile -- save intermediate trees, settings, and model details

       -fastest -- speed up the neighbor joining phase & reduce memory usage

              (recommended for >50,000 sequences)

       -n <number> to analyze multiple alignments (phylip format only)

              (use for global bootstrap, with seqboot and CompareToBootstrap.pl)

       -nosupport to not compute support values

       -intree newick_file to set the starting tree(s)

       -intree1 newick_file to use this starting tree for all the alignments

              (for faster global bootstrap on huge alignments)

       -pseudo to use pseudocounts (recommended for highly gapped sequences)

       -gtr -- generalized time-reversible model (nucleotide alignments only)

       -wag -- Whelan-And-Goldman 2001 model (amino acid alignments only)

       -quote -- allow spaces and other restricted characters (but not ' characters) in

              sequence  names  and quote names in the output tree (fasta input only; fasttreeMP will not be able
              to read these trees back in

       -noml to turn off maximum-likelihood

       -nome to turn off minimum-evolution NNIs and SPRs

              (recommended if running additional ML NNIs with -intree)

       -nome -mllen with -intree to optimize branch lengths for a fixed topology

       -cat # to specify the number of rate categories of sites (default 20)

              or -nocat to use constant rates

       -gamma -- after optimizing the tree under the CAT approximation,

              rescale the lengths to optimize the Gamma20 likelihood

       -constraints constraintAlignment to constrain the topology search

              constraintAlignment should have 1s or 0s to indicates splits

       -expert -- see more options

   Detailed usage for fasttreeMP 2.1.4 SSE3:
       fasttreeMP [-nt] [-n 100] [-quote] [-pseudo | -pseudo 1.0]

              [-boot 1000 | -nosupport] [-intree starting_trees_file | -intree1  starting_tree_file]  [-quiet  |
              -nopr]  [-nni  10]  [-spr  2]  [-noml | -mllen | -mlnni 10] [-mlacc 2] [-cat 20 | -nocat] [-gamma]
              [-slow | -fastest] [-2nd | -no2nd] [-slownni] [-seed 1253] [-top  |  -notop]  [-topm  1.0  [-close
              0.75]  [-refresh  0.8]] [-matrix Matrix | -nomatrix] [-nj | -bionj] [-wag] [-nt] [-gtr] [-gtrrates
              ac ag at cg ct gt] [-gtrfreq A C G T] [ -constraints constraintAlignment [ -constraintWeight 100.0
              ] ] [-log logfile]

              [ alignment_file ]

              > newick_tree

       or

       fasttreeMP [-nt] [-matrix Matrix | -nomatrix] [-rawdist] -makematrix [alignment]

              [-n 100] > phylip_distance_matrix

              fasttreeMP supports fasta or phylip interleaved alignments By default fasttreeMP  expects  protein
              alignments,  use -nt for nucleotides fasttreeMP reads standard input if no alignment file is given

   Input/output options:

       -n -- read in multiple alignments in. This only

              works  with  phylip  interleaved format. For example, you can use it with the output from phylip's
              seqboot. If you use -n, fasttreeMP will write 1 tree per line to standard output.

       -intree newickfile -- read the starting tree in from newickfile.

              Any branch lengths in the starting trees are ignored.

       -intree with -n will read a separate starting tree for each alignment.

       -intree1 newickfile -- read the same starting tree for each alignment

       -quiet -- do not write to standard error during normal operation (no progress

              indicator, no options summary, no likelihood values, etc.)

       -nopr -- do not write the progress indicator to stderr

       -log logfile -- save intermediate trees so you can extract

              the trees and restart long-running jobs if they crash -log also  reports  the  per-site  rates  (1
              means slowest category)

       -quote -- quote sequence names in the output and allow spaces, commas,

              parentheses, and colons in them but not ' characters (fasta files only)

   Distances:
              Default: For protein sequences, log-corrected distances and an

              amino acid dissimilarity matrix derived from BLOSUM45

              or  for  nucleotide  sequences,  Jukes-Cantor distances To specify a different matrix, use -matrix
              FilePrefix or -nomatrix Use -rawdist to turn the log-correction off or to use  %different  instead
              of Jukes-Cantor

       -pseudo [weight] -- Use pseudocounts to estimate distances between

              sequences  with little or no overlap. (Off by default.) Recommended if analyzing the alignment has
              sequences with little or no overlap.  If the weight is not specified, it is 1.0

   Topology refinement:
              By default, fasttreeMP tries to improve the tree with up to 4*log2(N) rounds of  minimum-evolution
              nearest-neighbor  interchanges  (NNI),  where  N  is  the  number of unique sequences, 2 rounds of
              subtree-prune-regraft  (SPR)  moves  (also  min.   evo.),   and   up   to   2*log(N)   rounds   of
              maximum-likelihood  NNIs.  Use -nni to set the number of rounds of min. evo. NNIs, and -spr to set
              the rounds of SPRs.  Use -noml to turn off both min-evo NNIs and SPRs (useful if refining

              an approximately maximum-likelihood tree with further NNIs)

              Use -sprlength set the maximum length of a SPR move (default 10) Use -mlnni to set the  number  of
              rounds  of  maximum-likelihood  NNIs Use -mlacc 2 or -mlacc 3 to always optimize all 5 branches at
              each NNI,

              and to optimize all 5 branches in 2 or 3 rounds

              Use -mllen to optimize branch lengths without ML NNIs Use -mllen -nome with  -intree  to  optimize
              branch  lengths on a fixed topology Use -slownni to turn off heuristics to avoid constant subtrees
              (affects both

              ML and ME NNIs)

   Maximum likelihood model options:

       -wag -- Whelan-And-Goldman 2001 model instead of (default) Jones-Taylor-Thorton 1992 model (a.a. only)

       -gtr -- generalized time-reversible instead of (default) Jukes-Cantor (nt only)

       -cat # -- specify the number of rate categories of sites (default 20)

       -nocat -- no CAT model (just 1 category)

       -gamma -- after the final round of optimizing branch lengths with the CAT model,

              report the likelihood under  the  discrete  gamma  model  with  the  same  number  of  categories.
              fasttreeMP  uses  the same branch lengths but optimizes the gamma shape parameter and the scale of
              the lengths.  The final tree will have rescaled lengths.  Used  with  -log,  this  also  generates
              per-site  likelihoods  for  use  with  CONSEL,  see  GammaLogToPaup.pl  and  documentation  on the
              fasttreeMP web site.

   Support value options:
              By default, fasttreeMP computes local support values by  resampling  the  site  likelihoods  1,000
              times  and  the  Shimodaira Hasegawa test. If you specify -nome, it will compute minimum-evolution
              bootstrap supports instead In either case, the support values are proportions ranging from 0 to 1

              Use -nosupport to turn off support values or -boot 100 to use just  100  resamples  Use  -seed  to
              initialize the random number generator

   Searching for the best join:
              By default, fasttreeMP combines the 'visible set' of fast neighbor-joining with

              local hill-climbing as in relaxed neighbor-joining

       -slow -- exhaustive search (like NJ or BIONJ, but different gap handling)

       -slow takes half an hour instead of 8 seconds for 1,250 proteins

       -fastest -- search the visible set (the top hit for each node) only

              Unlike  the  original  fast neighbor-joining, -fastest updates visible(C) after joining A and B if
              join(AB,C) is better than join(C,visible(C)) -fastest also updates out-distances in  a  very  lazy
              way, -fastest sets -2nd on as well, use -fastest -no2nd to avoid this

   Top-hit heuristics:
              By  default,  fasttreeMP uses a top-hit list to speed up search Use -notop (or -slow) to turn this
              feature off

              and compare all leaves to each other, and all new joined nodes to each other

       -topm 1.0 -- set the top-hit list size to parameter*sqrt(N)

              fasttreeMP estimates the top m hits of a leaf from the top 2*m hits of a 'close'  neighbor,  where
              close  is  defined as d(seed,close) < 0.75 * d(seed, hit of rank 2*m), and updates the top-hits as
              joins proceed

       -close 0.75 -- modify the close heuristic, lower is more conservative

       -refresh 0.8 -- compare a joined node to all other nodes if its

              top-hit list is less than 80% of the desired length, or if the age of the top-hit list is  log2(m)
              or greater

       -2nd or -no2nd to turn 2nd-level top hits heuristic on or off

              This  reduces  memory  usage and running time but may lead to marginal reductions in tree quality.
              (By default, -fastest turns on -2nd.)

   Join options:

       -nj: regular (unweighted) neighbor-joining (default)

       -bionj: weighted joins as in BIONJ

              fasttreeMP will also weight joins during NNIs

   Constrained topology search options:

       -constraints alignmentfile -- an alignment with values of 0, 1, and -

              Not all sequences need be present. A column of  0s  and  1s  defines  a  constrained  split.  Some
              constraints may be violated (see 'violating constraints:' in standard error).

       -constraintWeight -- how strongly to weight the constraints. A value of 1

              means a penalty of 1 in tree length for violating a constraint Default: 100.0

       For more information, see http://www.microbesonline.org/fasttree/

              or the comments in the source code

              fasttreeMP protein_alignment > tree fasttreeMP -nt nucleotide_alignment > tree fasttreeMP -nt -gtr
              < nucleotide_alignment > tree

       fasttreeMP accepts alignments in fasta or phylip interleaved formats

   Common options (must be before the alignment file):

       -quiet to suppress reporting information

       -nopr to suppress progress indicator

       -log logfile -- save intermediate trees, settings, and model details

       -fastest -- speed up the neighbor joining phase & reduce memory usage

              (recommended for >50,000 sequences)

       -n <number> to analyze multiple alignments (phylip format only)

              (use for global bootstrap, with seqboot and CompareToBootstrap.pl)

       -nosupport to not compute support values

       -intree newick_file to set the starting tree(s)

       -intree1 newick_file to use this starting tree for all the alignments

              (for faster global bootstrap on huge alignments)

       -pseudo to use pseudocounts (recommended for highly gapped sequences)

       -gtr -- generalized time-reversible model (nucleotide alignments only)

       -wag -- Whelan-And-Goldman 2001 model (amino acid alignments only)

       -quote -- allow spaces and other restricted characters (but not ' characters) in

              sequence  names  and quote names in the output tree (fasta input only; fasttreeMP will not be able
              to read these trees back in

       -noml to turn off maximum-likelihood

       -nome to turn off minimum-evolution NNIs and SPRs

              (recommended if running additional ML NNIs with -intree)

       -nome -mllen with -intree to optimize branch lengths for a fixed topology

       -cat # to specify the number of rate categories of sites (default 20)

              or -nocat to use constant rates

       -gamma -- after optimizing the tree under the CAT approximation,

              rescale the lengths to optimize the Gamma20 likelihood

       -constraints constraintAlignment to constrain the topology search

              constraintAlignment should have 1s or 0s to indicates splits

       -expert -- see more options

       For more information, see http://www.microbesonline.org/fasttree/

Lawrence Berkeley National Lab                      June 2012                                      fasttreeMP(1)