Ubuntu Manpage: phastMotif - Predicts motifs from a set of multiple alignments. Uses

NAME

       phastMotif - Predicts motifs from a set of multiple alignments.  Uses

DESCRIPTION

       Predicts  motifs  from a set of multiple alignments.  Uses an EM algorithm similar to that of MEME, but a
       motif is defined by phylogenetic models rather than multinomial distributions.   The  specified  multiple
       alignments  may  actually  be  single  sequences  (see  -m).  Various parameters control the strategy for
       initialization (see below).  Currently, the F81 substitution model is assumed.

USAGE

       phastMotif [-t <treefile>] [OPTIONS] <msa_list>

OPTIONS


       -t <file> (Required unless -m or -p) Use specified tree topology  for  all  phylogenetic  models  (Newick
              format).

       -i <fmt>
              Input format for alignment.  May be FASTA, PHYLIP, MPM, SS, or MAF (default FASTA).

       -b <file> Read background model from specified file (.mod format).

              By default, the background model is estimated in a preprocessing step, by pooling all data.

       -s     Estimate a separate background model for each multiple alignment.  (Not yet implemented.)

       -k <size> Learn motifs of the specified size (default is 10).

       -B <n>
              Report best <n> motifs (default 3).

       -m     MEME  mode.   Use  multinomial  rather than phylogenetic models.  Causes multiple alignments to be
              ignored -- any gaps are discarded and all sequences are assumed independent.

       -d <+lst> Use the discriminative training method of Segal  et  al.  (RECOMB'02),  rather  than  EM.   The
              specified list

              should  contain  the  filenames  from  msa_list  that  are  to  be  considered *positive* examples
              (containing the desired motif); all others will be considered negative examples.  Can be used with
              or without -m.  -p Use "profile" models  rather  than  phylogenetic  models  (characters  in  each
              alignment  column  assumed  independent).   The  resulting model is a hybrid of the full model and
              MEME's model.  Essentially, it uses the multiple  alignments  but  not  the  phylogeny.   NOT  YET
              IMPLEMENTED.   -n  <n>  Perform  <n> random restarts and report the motif with highest likelihood.
              Default number is 10.  Ignored with -I, -P, and -R unless -S is specified (see below).

       -I <mlst> Run the algorithm after a "soft" initialization with

              each of the consensus sequences in the specified list.  At each position, <pc>  pseudocounts  (see
              -c)  are  given to the consensus base and 1 pseudocount to all other bases.  Each string must have
              length at most equal to the size of the motif.  If shorter, it is used as a "seed"  for  a  motif,
              with  flanking  positions  treated  as  wildcards.   -P <x,y> Initialize with the x most prevalent
              y-tuples.  A soft initialization is performed, as above.  If  y  is  less  than  the  motif  size,
              y-tuples  are used as a "seed" for a motif, as above.  -R <x,y> Initialize with a random sample of
              x y-tuples.  A soft initialization is performed, as above.  If y is  less  than  the  motif  size,
              y-tuples  are  used  as  a  "seed" for a motif, as above.  -w <n> (for use with -I, -P, -R) Winnow
              initialization sequences to the top <n> based on the unmaximized likelihood.

       -c <pc>
              (for use with -I, -P, -R) Number of pseudocounts for consensus bases (default  5).   -S  (for  use
              with  -I,  -P,  -R) Instead of doing a deterministic initialization based on a consensus sequence,
              sample parameters from a Dirichlet distribution defined by the pseudocounts  (see  -c).   In  this
              case, random restarts are performed, as specified by -n.

       -o  <pref>  Use  the  specified  prefix for all output files (dflt. "phastm").  -H Produce HTML formatted
              output, in addition to ordinary output.  One file is produced per predicted motif, as  well  as  a
              single HTML-formatted summary file.

       -D     Produce  a BED file with predicted motifs, for use in the UCSC browser.  Currently, sequence names
              must be formatted such as "chr10:102553847-102554897+", with  the  final  '+'  or  '-'  indicating
              strand.

       -x     (For use with -H or -D) Suppress ordinary output to stdout.

       -h     Print this help message.

phastMotif 1.4                                      May 2016                                       PHASTMOTIF(1)