Ubuntu Manpage: phytime - Bayesian estimation of divergence times from large sequence alignments

NAME

       phytime - Bayesian estimation of divergence times from large sequence alignments

DESCRIPTION

       Bayesian  estimation  of  divergence  times from molecular sequences relies on sophisticated Markov chain
       Monte Carlo techniques, and Metropolis-Hastings  (MH)  samplers  have  been  successfully  used  in  that
       context.  This  approach  involves  heavy  computational  burdens  that  can hinder the analysis of large
       phylogenomic data sets. Reliable estimation of divergence times can also be extremely time consuming,  if
       not impossible, for sequence alignments that convey weak or conflicting phylogenetic signals, emphasizing
       the  need  for  more efficient sampling methods. This article describes a new approach that estimates the
       posterior density of substitution rates and node times. The prior  distribution  of  rates  accounts  for
       their  potential  autocorrelation  along  lineages,  whereas priors on node ages are modeled with uniform
       densities. Also,  the  likelihood  function  is  approximated  by  a  multivariate  normal  density.  The
       combination  of these components leads to convenient mathematical simplifications, allowing the posterior
       distribution of rates and times to be estimated using a Gibbs sampling algorithm. The  analysis  of  four
       real-world  data  sets  shows that this sampler outperforms the standard MH approach and demonstrates the
       suitability of this new method for analyzing large and/or difficult data sets.

SYNOPSIS

       phytime [command args]

OPTIONS

       All the options below are optional except '-i','-u' and '--calibration'.

       Command options:

              -i (or --input) seq_file_name

              seq_file_name is the name of the nucleotide or amino-acid sequence file in PHYLIP format.

       -d (or --datatype) data_type

              data_type is 'nt' for nucleotide (default), 'aa' for  amino-acid  sequences,  or  'generic',  (use
              NEXUS file format and the 'symbols' parameter here).

       -q (or --sequential)

                       Changes interleaved format (default) to sequential format.

       -m (or --model) model

              model : substitution model name.  - Nucleotide-based models : HKY85 (default) | JC69 | K80 | F81 |
              F84  |  TN93 | GTR | custom (*) (*) : for the custom option, a string of six digits identifies the
              model. For instance, 000000

              corresponds to F81 (or JC69 provided the  distribution  of  nucleotide  frequencies  is  uniform).
              012345  corresponds to GTR. This option can be used for encoding any model that is a nested within
              GTR.

              - Amino-acid based models : LG (default) | WAG | JTT | MtREV | Dayhoff | DCMut | RtREV |  CpREV  |
              VT

       Blosum62 | MtMam | MtArt | HIVw |
              HIVb | custom

       --aa_rate_file filename

              filename  is  the  name  of the file that provides the amino acid substitution rate matrix in PAML
              format.  It is compulsory to use this option when analysing amino acid sequences with the `custom'
              model.

       --calibration filename

              filename is the name of the calibration file that provides a priori defined  boundaries  for  node
              ages.  Please read the manual for more information about the format of this file.

       -t (or --ts/tv) ts/tv_ratio

              ts/tv_ratio  :  transition/transversion  ratio. DNA sequences only.  Can be a fixed positive value
              (ex:4.0) or e to get the maximum likelihood estimate.

       -v (or --pinv) prop_invar

              prop_invar : proportion of invariable sites.  Can be a fixed value in the [0,1] range or e to  get
              the maximum likelihood estimate.

       -c (or --nclasses) nb_subst_cat

              nb_subst_cat : number of relative substitution rate categories. Default : nb_subst_cat=4.  Must be
              a positive integer.

       -a (or --alpha) gamma

              gamma  : distribution of the gamma distribution shape parameter.  Can be a fixed positive value or
              e to get the maximum likelihood estimate.

       -u (or --inputtree) user_tree_file

              user_tree_file : starting tree filename. The tree must be in Newick format.

       --r_seed num

              num is the seed used to initiate the random number generator.  Must be an integer.

       --run_id ID_string

              Append the string ID_string at the end of each PhyML output file.  This option may be useful  when
              running simulations involving PhyML.

       --quiet

              No interactive question (for running in batch mode) and quiet output.

       --no_memory_check

              No interactive question for memory usage (for running in batch mode). Normal output otherwise.

       --chain_len num

              num  is the number of generations or runs of the Markov Chain Monte Carlo. Set to 1E+6 by default.
              Must be an integer.

       --sample_freq num

              The chain is sampled every num generations. Set to 1E+3 by default.  Must be an integer.

       --no_data

              Use this option to sample from the priors only (rather from the posterior  joint  density  of  the
              model parameters).

       --fastlk

              Use the multivariate normal approximation to the likelihood and speed up calculations

NAME

DESCRIPTION

SYNOPSIS

OPTIONS

SEE ALSO