Provided by: phast_1.4+dfsg-1_amd64
NAME
phastBias - Identify regions of the alignment which are affected by gBGC,
SYNOPSIS
The alignment file can be in any of several file formats (see --msa-format). The neutral model must be in the .mod format produced by the phyloFit program. The foreground_branch should identify a branch of the tree (internal branches can be named with tree_doctor --name-ancestors).
DESCRIPTION
Identify regions of the alignment which are affected by gBGC, indicated by a cluster of weak-to-strong (A/T -> G/C) substitutions amidst a deficit of strong-to-weak substitutions on a particular branch of the tree. The regions are identified by a phylo-HMM with four states: neutral, conserved, neutral with gBGC, and conserved with gBGC. OUTPUT: phastBias produces a wig file with scores for every position in the alignment indicating the probability of being in one of the gBGC states. It can also produce gBGC tracts by thresholding this probability at 0.5, or a matrix of probabilities for all four states. See OUTPUT OPTIONS below.
OPTIONS
GENERAL OPTIONS: --help,-h Print this help message. TUNING PARAMETER OPTIONS: gBGC PARAMETERS: --bgc <B> The B parameter describes the strength of gBGC. It must be > 0. Too low of a value may yield false positives, as the gBGC model becomes indistinguishable from the non-gBGC model. Default: 3 --estimate-bgc <0|1> Use "--estimate-bgc 1" to estimate B by maximum likelihood. Default: 0 --bgc-exp-length <length> Set the prior expected length of gBGC tracts. This is equivalent to 1/alpha in the parametrization defined by Capra et al, where alpha is the rate out of gBGC states. Default: 1000 --estimate-bgc-exp-length <0|1> Use "--estimate-bgc-exp-length 1" to estimate this parameter by an expectation-maximization algorithm. Default: 0 --bgc-target-coverage <coverage> Set the prior for gBGC tract coverage (as a fraction between 0 and 1). This is represented in the model as beta/(alpha+beta), where beta is the rate into the gBGC state, and alpha is the rate out of the gBGC state. Default: 0.01 --estimate-bgc-target-coverage <0|1> Use "--estimate-bgc-target-coverage 0" to hold this parameter constant. Default: 1 (This is the only parameter estimated by default.) CONSERVATION PARAMETERS: Note: it is not recommended to tune these parameters with phastBias. Rather, phastCons may be used to determine the best values for rho and the transition rates into/out of conserved elements. See phastCons --help and the phastCons HOWTO (available online) to learn about tuning these parameters. --rho <rho> Set the scaling factor for branch lengths in conserved states. Rho should be between 0 and 1. Default: 0.31 --cons-exp-length <length> Set the prior expected length of conserved elements. This parameter is held constant; if you want to tune it, it is recommended to do this with the phastCons program under a non-gBGC model (see the --expected-length option in phastCons). Default: 45 --cons-target-coverage <cov> Set the prior for coverage of conserved elements (as a fraction between 0 and 1). Like the --cons-exp-length above, this parameter is also held constant, but can be tuned with phastCons (see phastCons --transitions). Default: 0.3 OTHER PARAMETERS: --scale <scale> Set an overall scaling factor for the branch lengths in all states. Default: 1 --estimate-scale <0|1> Rescale the branches in all states by a scaling factor determined by maximum likelihood (initialized by --scale above). Default: 0 --eqfreqs-from-msa <0|1> Reset equilibrium frequencies of A,C,G,T based on frequencies observed in the alignment. Otherwise will not be altered from input model. Default: 1 OUTPUT OPTIONS --output-tracts <file.gff> Print a GFF file identifying all regions with posterior probability of being in a gBGC state > 0.5. --posteriors <none|wig|full> Use this option to control posterior probability output, which is written to stdout. "none" implies do not output anything; wig outputs a standard fixed-step wiggle file giving the probability that each base is assigned to a gBGC state; "full" outputs a table with five columns. The first column is the coordinate (1-based relative to the first sequence in the alignment), followed by the probabilities of each of the four states: neutral, conserved, gBGC neutral, gBGC conserved. Default: wig --output-mods <output_root> Print out the tree models for all four states to <output_root>.cons.mod, <output_root>.neutral.mod, <output_root>.gBGC_cons.mod, and <output_root>.gBGC_neutral.mod. --informative-fn,-i <file.gff> Print a GFF containing regions of the alignment which are informative for gBGC. Note: only works properly if foreground branch is a single branch (not a group of branches). --informative-only,-o (To be used with --informative-fn). Print the informative regions, then quit.
SEE ALSO
Capra JA, Hubisz MJ, Kostka D, Pollard KS, Siepel A: A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes. (Manuscript in submission).