Ubuntu Manpage: phytime - Bayesian estimation of divergence times from large sequence alignments

Provided by: phyml_20120412-2_amd64

NAME

       phytime - Bayesian estimation of divergence times from large sequence alignments

DESCRIPTION

       Bayesian  estimation  of divergence times from molecular sequences relies on sophisticated
       Markov chain Monte Carlo techniques,  and  Metropolis-Hastings  (MH)  samplers  have  been
       successfully used in that context. This approach involves heavy computational burdens that
       can hinder the analysis of large phylogenomic data sets. Reliable estimation of divergence
       times  can  also  be  extremely time consuming, if not impossible, for sequence alignments
       that convey weak or conflicting  phylogenetic  signals,  emphasizing  the  need  for  more
       efficient  sampling  methods.  This  article  describes  a new approach that estimates the
       posterior density of substitution rates and node times. The prior  distribution  of  rates
       accounts  for  their potential autocorrelation along lineages, whereas priors on node ages
       are modeled with uniform densities. Also, the likelihood function  is  approximated  by  a
       multivariate  normal  density.  The  combination  of  these components leads to convenient
       mathematical simplifications, allowing the posterior distribution of rates and times to be
       estimated  using  a  Gibbs  sampling  algorithm. The analysis of four real-world data sets
       shows that this  sampler  outperforms  the  standard  MH  approach  and  demonstrates  the
       suitability of this new method for analyzing large and/or difficult data sets.

SYNOPSIS

       phytime [command args]

OPTIONS

       All the options below are optional except '-i','-u' and '--calibration'.

       Command options:

              -i (or --input) seq_file_name

              seq_file_name  is  the name of the nucleotide or amino-acid sequence file in PHYLIP
              format.

       -d (or --datatype) data_type

              data_type is 'nt' for nucleotide  (default),  'aa'  for  amino-acid  sequences,  or
              'generic', (use NEXUS file format and the 'symbols' parameter here).

       -q (or --sequential)

                       Changes interleaved format (default) to sequential format.

       -m (or --model) model

              model  :  substitution  model  name.  - Nucleotide-based models : HKY85 (default) |
              JC69 | K80 | F81 | F84 | TN93 | GTR | custom (*) (*) : for  the  custom  option,  a
              string of six digits identifies the model. For instance, 000000

              corresponds  to F81 (or JC69 provided the distribution of nucleotide frequencies is
              uniform).  012345 corresponds to GTR. This option can  be  used  for  encoding  any
              model that is a nested within GTR.

              -  Amino-acid  based  models : LG (default) | WAG | JTT | MtREV | Dayhoff | DCMut |
              RtREV | CpREV | VT

       Blosum62 | MtMam | MtArt | HIVw |
              HIVb | custom

       --aa_rate_file filename

              filename is the name of the file that provides the  amino  acid  substitution  rate
              matrix  in  PAML  format.  It is compulsory to use this option when analysing amino
              acid sequences with the `custom' model.

       --calibration filename

              filename is the name of  the  calibration  file  that  provides  a  priori  defined
              boundaries  for  node  ages.  Please read the manual for more information about the
              format of this file.

       -t (or --ts/tv) ts/tv_ratio

              ts/tv_ratio : transition/transversion ratio. DNA sequences only.  Can  be  a  fixed
              positive value (ex:4.0) or e to get the maximum likelihood estimate.

       -v (or --pinv) prop_invar

              prop_invar  :  proportion  of  invariable sites.  Can be a fixed value in the [0,1]
              range or e to get the maximum likelihood estimate.

       -c (or --nclasses) nb_subst_cat

              nb_subst_cat  :  number  of  relative  substitution  rate  categories.  Default   :
              nb_subst_cat=4.  Must be a positive integer.

       -a (or --alpha) gamma

              gamma  :  distribution  of  the gamma distribution shape parameter.  Can be a fixed
              positive value or e to get the maximum likelihood estimate.

       -u (or --inputtree) user_tree_file

              user_tree_file : starting tree filename. The tree must be in Newick format.

       --r_seed num

              num is the seed used to initiate the random number generator.  Must be an integer.

       --run_id ID_string

              Append the string ID_string at the end of each PhyML output file.  This option  may
              be useful when running simulations involving PhyML.

       --quiet

              No interactive question (for running in batch mode) and quiet output.

       --no_memory_check

              No interactive question for memory usage (for running in batch mode). Normal output
              otherwise.

       --chain_len num

              num is the number of generations or runs of the Markov Chain Monte  Carlo.  Set  to
              1E+6 by default.  Must be an integer.

       --sample_freq num

              The  chain  is  sampled  every num generations. Set to 1E+3 by default.  Must be an
              integer.

       --no_data

              Use this option to sample from the priors only (rather  from  the  posterior  joint
              density of the model parameters).

       --fastlk

              Use   the  multivariate  normal  approximation  to  the  likelihood  and  speed  up
              calculations

NAME

DESCRIPTION

SYNOPSIS

OPTIONS

SEE ALSO