lunar (1) mptp.1.gz

Provided by: mptp_0.2.4-3build1_amd64 bug

NAME

       mptp — single-locus species delimitation

SYNOPSIS

       Maximum-likelihood species delimitation:
              mptp --ml (--single | --multi) --tree_file newickfile --output_file outputfile
              [options]

       Species delimitation with support values:
              mptp --mcmc positive integer (--single | --multi) (--mcmc_startnull |
              --mcmc_startrandom | --mcmc_startml) --mcmc_log positive integer --tree_file
              newickfile --output_file outputfile [options]

DESCRIPTION

       Species is one of the fundamental units  of  comparison  in  virtually  all  subfields  of
       biology,  from  systematics  to  anatomy,  development,  ecology,  evolution, genetics and
       molecular biology. The aim of mptp is to offer  an  open  source  tool  to  infer  species
       boundaries  on a a given phylogenetic tree based on the Poisson Tree Process (PTP) and the
       Multiple Poisson Tree Process (mPTP) models.

       mptp offers two methods for inferring species delimitation.  First,  a  maximum-likelihood
       based  method that uses a dynamic programming approach to infer an ML estimate. Second, an
       mcmc approach for sampling the space of possible delimitations  providing  the  user  with
       support values on the tree clades.  Both approaches are available in two flavours: the PTP
       and the mPTP model. The PTP model is specified by using the single switch and the mPTP  by
       using multi.

   Input
       The  input  for  mptp is a newick file that contains one phylogenetic tree, i.e., branches
       express the expected number of substitutions per alignment site.

   Options
       mptp parses a large number of command-line options. For  easier  navigation,  options  are
       grouped below by theme.

       General options:

              --help   Display help text and exit.

              --version
                       Output version information and exit.

              --quiet  Suppress  all  output  to  stdout  except  for  warnings  and  fatal error
                       messages.

              --tree_file filename
                       Input newick file that contains a phylogenetic  tree.  Can  be  rooted  or
                       unrooted.

              --output_file filename
                       Specifies  the  prefix  used  for  generating  output  files. For maximum-
                       likelihood  species  delimitation  two  files  will  be  created.   First,
                       filename.txt  that  contains the actual delimitation and filename.svg that
                       contains an SVG figure of the computed delimitation. For mcmc analyses,  a
                       file  filename.txt  is created that contains the newick tree with supports
                       values.

              --outgroup comma-separated list of taxa
                       All computations for species delimitation are carried out on rooted trees.
                       This  option  is  used only (and is required) In case an unrooted tree was
                       specified with the --tree_file option. mptp roots  the  unrooted  tree  by
                       splitting  the branch leading to the most recent common ancestor (MRCA) of
                       the comma-separated list of taxa into  two  branches  of  equal  size  and
                       introducing  a  new  node  (the root of the new rooted tree) that connects
                       these two branches.

              --outgroup_crop
                       Crops taxa specified with the --outgroup option from the the tree.

              --min_br real
                       Any branch lengths in the input  tree  smaller  or  equal  than  real  are
                       excluded  (ignored) from the computations. In addition, for mcmc analyses,
                       subtrees that exclusively consist of branch lengths smaller  or  equal  to
                       real  are  completely ignored from the proposals (support values for those
                       clades are set to 0). (default: 0.0001)

              --precision positive integer
                       Specifies the precision of the decimal part of floating point  numbers  on
                       output (default: 7)

              --minbr_auto filename
                       Automatically  detects  the  minimum branch length from the p-distances of
                       the FASTA file filename.

              --tree_show
                       Show an ASCII version of the processed input tree (i.e. after it is rooted
                       by, potentially cropping, the outgroup).

       Maximum-likelihood estimations:

              Estimating  the  maximum-likelihood  delimitation  is  triggered by the switch --ml
              followed by --single (the PTP model) or --ml --multi (the mPTP  model).  Note  that
              these  two  methods  affect how options --output_file behaves and can be controlled
              using the --min_br switch. Both methods require a rooted phylogenetic tree, however
              an  unrooted  tree  may  be specified in conjunction with the option --outgroup. In
              this case, mptp roots it at that outgroup (see General options, --outgroup for more
              info).  Note  that both methods output an SVG depiction of the ML delimitation. See
              Visualization for more information on adjusting and fine-tuning the SVG output.

              Both methods ignore discard branch lengths of size smaller than the size  specified
              using the --min_br option. The PTP model then attempts to find a connected subgraph
              of the rooted tree that (a) contains the root, and (b) the sum  of  likelihoods  of
              fitting  the  edges  of  that  subgraph  in  one  exponential  distribution and the
              remaining   edges  in  another  (exponential  distribution)  is   maximized.   With
              likelihood  we  mean  the  sums  of  the probability density function with the mean
              defined as the reciprocal  of  the  average  of  edge  lengths  in  the  particular
              distribution.

              --ml --single
                       Triggers  the  algorithm  for computing an ML estimate of the delimitation
                       using the PTP model.

              --ml --multi
                       Triggers the algorithm for computing an ML estimate  of  the  delimitation
                       using the mPTP model.

              --pvalue  real
                       Only  used  with the PTP model (specified with --single). Sets the p-value
                       for performing a likelihood ratio test. Note that, there is no  likelihood
                       ratio test for the mPTP model this test is not done. (default: 0.001)

       MCMC method:

              The  MCMC  method is triggered with the --mcmc switch combined with either --single
              (the PTP model) or --multi (the mPTP model).

              Some more stuff to write

              --mcmc  positive integer --single
                       Triggers  the  algorithm  for  computing  support  values  by  taking  the
                       specified number of MCMC samples (delimitations) using the PTP model.

              --mcmc  positive integer --multi
                       Triggers  the  algorithm  for  computing  support  values  by  taking  the
                       specified number of MCMC samples (delimitations) using the mPTP model.

              --mcmc_sample  positive integer
                       Sample only every n-th MCMC step.

              --mcmc_log
                       Log the scores (log-likelihood) for each MCMC sample in a file and  create
                       an SVG plot.

              --mcmc_burnin  positive integer
                       Ignore all MCMC samples generated before the specified step. (default: 1)

              --mcmc_runs  positive integer
                       Perform  multiple  MCMC  runs.  If more than 1 run is specified, mptp will
                       generate one seed for each run based on the provided seed using the --seed
                       switch.  Output files will be generated for each run (default: 1)

              --mcmc_credible  real
                       Specify  the  probability  (0.0 to 1.0) for which to generate the credible
                       interval i.e., the probability the true number of species will fall within
                       the credible interval given the observed data. (default: 0.95)

              --mcmc_startnull
                       Start MCMC sampling from the null-model.

              --mcmc_startrandom
                       Start MCMC sampling from a random delimitation.

              --mcmc_startrandom
                       Start MCMC sampling from the ML delimitation.

              --seed  positive integer
                       Specifies  the  seed  for  the  pseudo-random  number generator. (default:
                       randomly generated based on system time)

       SVG Output:

              The ML method generates one SVG file that visualizes the processed input tree (i.e.
              after  it  is rooted by, potentially cropping, the outgroup) and marks the subtrees
              corresponding to coalescent processes (the detected species groups) with red color,
              while the speciation process is colored green.

              The  MCMC method generates one SVG file per run visualizing the processed tree, and
              indicates the support value for each node, i.e., the  percentage  of  MCMC  samples
              (delimitations) in which the particular node was part of the speciation process.  A
              value of 1 means it was always in the speciation process while a value of  0  means
              it  was  always in a coalescent process. The tree branches are colored according to
              the support values of descendant nodes; a support of value of  0  is  colored  with
              red,  1  with  black,  and  values in between are gradients of the two colors. Only
              support values above 0.5 are shown to  avoid  packed  numbers  in  dense  branching
              events.  In  addition,  if --mcmc_log is specified, an additional SVG image of log-
              likelihoods plots for each sampled delimitation is created.

              --svg_width  positive integer
                       Sets the total width (including margins) of the SVG in  pixels.  (default:
                       1920)

              --svg_fontsize  positive integer
                       Size of font in SVG image. (default: 12)

              --svg_tipspacing  positive integer
                       Vertical space in pixels between taxa in SVG tree. (default: 20)

              --svg_legend_ratio  real
                       Ratio  (value between 0.0 and 1.0) of total tree length to be displayed as
                       legend line.  (default: 0.1)

              --svg_nolengend
                       Hide legend.

              --svg_marginleft  positive integer
                       Left margin in pixels. (default: 20)

              --svg_marginright  positive integer
                       Right margin in pixels. (default: 20)

              --svg_margintop  positive integer
                       Top margin in pixels. (default: 20)

              --svg_marginbottom  positive integer
                       Top margin in pixels. (default: 20)

              --svg_inner_radius  positive integer
                       Radius of inner nodes in pixels. (default: 0)

EXAMPLES

       Compute the maximum likelihood estimate using the mPTP model by  discarding  all  branches
       with length below or equal to 0.0001

              mptp --ml --multi --min_br 0.0001 --tree_file newick.txt --output_file out

       Run  an  MCMC  analysis  of  100  million  steps  with the mPTP model, that logs every one
       million-th step, ignores the first 2 million steps and discards all branches with  lengths
       smaller or equal to 0.0001. Use 777 as seed. The chain will start from the ML delimitation
       (default).

              mptp --mcmc 100000000 --multi --min_br 0.0001 --tree_file newick.txt  --output_file
              out --mcmc_log 1000000 --mcmc_burnin 2000000 -seed 777

       Perform  an  MCMC  analysis  of 5 runs, each of 100 million steps with the mPTP model, log
       every one million-th step, ignore the first 2 million steps, and detect the minimum branch
       length  by  specifying the FASTA file alignment.fa that contains the alignment. Use 777 as
       seed. Start each run from a random delimitation.

              mptp --mcmc  100000000  --multi  ---mcmc_runs  5  --mcmc_log  1000000  --minbr_auto
              alignment.fa  --tree_file  newick.txt --output_file out --mcmc_burnin 2000000 -seed
              777 --mcmc_startrandom

AUTHORS

       Implementation by Tomas Flouri, Sarah Lutteropp and Paschalia Kapli.  Additional  PTP  and
       mPTP  model  authors include Kassian Kobert, Jiajie Zhang, Pavlos Pavlidis, and Alexandros
       Stamatakis.

REPORTING BUGS

       Submit suggestions and bug-reports at  <https://github.com/Pas-Kapli/mptp/issues>,  or  e-
       mail Tomas Flouri <Tomas.Flouri@h-its.org>.

AVAILABILITY

       Source code and binaries are available at <https://github.com/Pas-Kapli/mptp>.

       Copyright (C) 2015-2017, Tomas Flouri, Sarah Lutteropp, Paschalia Kapli

       All rights reserved.

       Contact:   Tomas   Flouri   <Tomas.Flouri@h-its.org>,   Scientific  Computing,  Heidelberg
       Insititute for Theoretical Studies, 69118 Heidelberg, Germany

       This software is licensed under the terms of the GNU Affero General Public License version
       3.

       GNU Affero General Public License version 3

       This program is free software: you can redistribute it and/or modify it under the terms of
       the GNU Affero General Public License as published by the Free Software Foundation, either
       version 3 of the License, or (at your option) any later version.

       This  program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
       without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR  PURPOSE.
       See the GNU Affero General Public License for more details.

       You  should  have received a copy of the GNU Affero General Public License along with this
       program.  If not, see <http://www.gnu.org/licenses/>.

VERSION HISTORY

       New features and important modifications of mptp (short lived or minor  bug  releases  may
       not be mentioned):

              v0.1.0 released June 27th, 2016
                     First public release.

              v0.1.1 released July 15th, 2016
                     Bug fix (now LRT test is not printed in output file when using --multi)

              v.0.2.0 released September 27th, 2016
                     Fixed  floating point exception error when constructing random trees, caused
                     from dividing by zero.  Changed allocation from  malloc  to  calloc,  as  it
                     caused unititialized variables when converting unrooted trees to rooted when
                     using the MCMC method. Fixed sample size for the AIC with a  correction  for
                     finite sample sizes.

              v.0.2.1 released October 18th, 2016
                     Updated  ASV  to  consider only coalescent roots of ML delimitation. Removed
                     assertion stopping mptp when using random  starting  delimitations  for  the
                     MCMC method.

              v0.2.2 released January 31st, 2017
                     Fixed  regular  expressions  to allow scientific notation for branch lengths
                     when parsing trees.  Improved the accuracy of ASV score by also taking  into
                     account  tips  forming coalescent roots.  Fixed memory leaks that occur when
                     parsing incorrectly formatted trees.

              v0.2.3 released July 25th, 2017
                     Replaced hsearch() with custom hashtable. Fixed minor output error messages.

              v0.2.4 released May 14th, 2018
                     If we do not manage to generate a  random  starting  delimitation  with  the
                     wanted  number  of species (randomly chosen), we use the currently generated
                     delimitation instead.