Ubuntu Manpage: phastMotif - Predicts motifs from a set of multiple alignments. Uses

NAME

       phastMotif - Predicts motifs from a set of multiple alignments.  Uses

DESCRIPTION

       Predicts  motifs  from a set of multiple alignments.  Uses an EM algorithm similar to that
       of  MEME,  but  a  motif  is  defined  by  phylogenetic  models  rather  than  multinomial
       distributions.   The  specified  multiple alignments may actually be single sequences (see
       -m).  Various parameters control the strategy for initialization (see below).   Currently,
       the F81 substitution model is assumed.

USAGE

       phastMotif [-t <treefile>] [OPTIONS] <msa_list>

OPTIONS

       -t  <file>  (Required  unless  -m  or -p) Use specified tree topology for all phylogenetic
              models (Newick format).

       -i <fmt>
              Input format for alignment.  May be FASTA, PHYLIP, MPM, SS, or MAF (default FASTA).

       -b <file> Read background model from specified file (.mod format).

              By default, the background model is estimated in a preprocessing step,  by  pooling
              all data.

       -s     Estimate  a  separate  background  model  for  each  multiple  alignment.  (Not yet
              implemented.)

       -k <size> Learn motifs of the specified size (default is 10).

       -B <n>
              Report best <n> motifs (default 3).

       -m     MEME mode.  Use multinomial  rather  than  phylogenetic  models.   Causes  multiple
              alignments  to  be  ignored -- any gaps are discarded and all sequences are assumed
              independent.

       -d <+lst> Use the discriminative training method of Segal et al. (RECOMB'02), rather  than
              EM.  The specified list

              should  contain  the  filenames  from msa_list that are to be considered *positive*
              examples (containing the desired motif); all others  will  be  considered  negative
              examples.   Can  be  used  with or without -m.  -p Use "profile" models rather than
              phylogenetic models (characters in each alignment column assumed independent).  The
              resulting  model  is  a hybrid of the full model and MEME's model.  Essentially, it
              uses the multiple alignments but not the phylogeny.  NOT YET IMPLEMENTED.   -n  <n>
              Perform  <n> random restarts and report the motif with highest likelihood.  Default
              number is 10.  Ignored with -I, -P, and -R unless -S is specified (see below).

       -I <mlst> Run the algorithm after a "soft" initialization with

              each of the consensus sequences in the specified  list.   At  each  position,  <pc>
              pseudocounts  (see  -c)  are  given  to the consensus base and 1 pseudocount to all
              other bases.  Each string must have length at most equal to the size of the  motif.
              If  shorter, it is used as a "seed" for a motif, with flanking positions treated as
              wildcards.  -P <x,y> Initialize  with  the  x  most  prevalent  y-tuples.   A  soft
              initialization  is performed, as above.  If y is less than the motif size, y-tuples
              are used as a "seed" for a motif, as above.  -R  <x,y>  Initialize  with  a  random
              sample  of x y-tuples.  A soft initialization is performed, as above.  If y is less
              than the motif size, y-tuples are used as a "seed" for a motif, as above.   -w  <n>
              (for  use  with -I, -P, -R) Winnow initialization sequences to the top <n> based on
              the unmaximized likelihood.

       -c <pc>
              (for use with -I, -P, -R) Number of pseudocounts for consensus bases  (default  5).
              -S  (for use with -I, -P, -R) Instead of doing a deterministic initialization based
              on a consensus sequence, sample parameters from a Dirichlet distribution defined by
              the  pseudocounts  (see  -c).   In  this  case,  random  restarts are performed, as
              specified by -n.

       -o <pref> Use the specified prefix for all output files (dflt. "phastm").  -H Produce HTML
              formatted  output,  in  addition  to  ordinary  output.   One  file is produced per
              predicted motif, as well as a single HTML-formatted summary file.

       -D     Produce a BED file with predicted motifs, for use in the UCSC browser.   Currently,
              sequence  names  must  be  formatted such as "chr10:102553847-102554897+", with the
              final '+' or '-' indicating strand.

       -x     (For use with -H or -D) Suppress ordinary output to stdout.

       -h     Print this help message.