bionic (1) patman.1.gz

Provided by: patman_1.2.2+dfsg-4_amd64 bug

NAME

       PatMaN - search for approximate patterns in DNA libraries

SYNOPSIS

       patman [ option | file ... ]

DESCRIPTION

       PatMaN searches for (small) patterns in (huge) DNA databases, allowing for some mismatches and optionally
       gaps.  Patterns and databases are read from one or more fasta(5) files listed  as  non-option  arguments,
       depending  on whether the -D or -P option last preceded them, and matched against each other.  The output
       of PatMaN is a table containing one line for each match, consisting of tab-separated fields:

       •   name of database sequence,

       •   name of pattern,

       •   position of first matched base in database sequence, the sequence's beginning has position 1,

       •   position of last matched base in database sequence,

       •   strand (+ for literal match, - for reverse complement),

       •   edit distance (number of mismatches plus number of gaps).

OPTIONS

       -V, --version
              Print version number and exit.

       -e num, --edits num
              Allow up to num mismatches and/or gaps per match.

       -g num, --gaps num
              Allow up to num gaps per match.  Note that gaps count as mismatches, too, so the -e option  should
              always  be  set  at  least  as high as the -g option.  Allowing many gaps can incur a considerable
              computational cost.

       -D, --databases
              Treat the following files as database.  Databases must be in fasta(5) format.   Multiple  database
              files, including "-" for standard input, are allowed and are read in turn.

       -P, --patterns
              Treat  the  following  files  as  patterns.   Pattern  files must be in fasta(5) format.  Multiple
              pattern files, including "-" for standard input, are allowed and are all read before touching  the
              databases.

       -o file, --output file
              Redirect  output  to file.  The file name "-" causes output to be written to stdout, which is also
              the default

       -a, --ambicodes
              Activate the interpretation of ambiguity codes in patterns.  This results in the expansion of  any
              pattern  with  ambiguity  codes  into  multiple  patterns  which can match independently.  Compare
              Unknown Nucleotides below.

       -s, --singlestrand
              Deactivate matching of reverse-complements.  Normally, PatMaN will  try  to  match  patterns  both
              literally  and  after  reverse-complementing  them,  with  this  option set, only straight forward
              matches are considered.

       -p num, --prefetch num
              Causes num pointers to be prefetched in advance.  This feature can improve performance, if  PatMaN
              has  been  compiled for a processor architecture that supports prefetching.  The optimum value for
              your particular setup has to be determined empirically, but the default should be reasonably good.

       -l len, --min-length len
              Only consider patterns with a length of at  least  len.   Use  this  if  your  pattern  collection
              contains short sequences that you don't want lots of possible matches reported for.

       -x num, --chop3 num
              Cut  off  num  bases from the 3' end of each pattern.  Use this for patterns with damaged, edited,
              etc. 3' ends that should be ignored.  The chopped bases are neither matched nor  included  in  the
              reported match regions.

       -X num, --chop5 num
              Cut  off  num  bases from the 5' end of each pattern.  Use this for patterns with damaged, edited,
              etc. 5' ends that should be ignored.  The chopped bases are neither matched nor  included  in  the
              reported match regions.

       -A, --adenine-hack
              Allow  adenine  to be ignored in patterns.  This is essentially equivalent to not counting gaps in
              the database, as long as it was an A that was gapped.  Using -A can be  computationally  extremely
              expensive, both in terms of memory and time consumed.

       -q, --quiet
              Suppress warnings (about unrecognized characters in input sequences or missing input files).  Even
              without -q, at most one such warning is given per run.

       -v, --verbose
              Prints additional progress information to stderr.

       -d flags, --debug flags
              Sets debugging flags to flags.Flags may be the logical OR of any of the following values, each  of
              which causes some output to appear on stderr.  Some of the values may only work if PatMaN has been
              compiled in debug mode.  The default value is 1.

       1      Print warnings.  Equivalent to not setting -q.

       2      Print progress information.  Equivalent to setting -v.

       4      Dump the suffix trie of the patterns.  Only available in debug build.

       8      Count number of visited nodes and print that number in each iteration.  Only  available  in  debug
              build.

       16     Print total number of nodes fetched from memory after completing all databases.

       32     Output database sequence while it is being matched.

NOTES

   Non-Option Arguments
       Non-option  arguments  (bare  filenames)  are  either  treated as database or pattern files, depending on
       whether the -D or -P option was the the last that occurred before the filename.  If neither -D nor -P was
       given,  file  names  are  treated  as  pattern  files.  If no database was given, it is instead read from
       standard input.  Standard input can be explicitly given as either a database or a pattern file  by  using
       the  filename  "-".   A  warning  is given if standard input is selected implicitly as database, an error
       message is given if no pattern files have been named at all.

   Gapped Matching
       Allowing gaps often causes overlapping matches of single patterns at almost the  same  position.   PatMaN
       makes  no  attempt  to filter these redundant matches.  Also note that allowing many gaps, and especially
       allowing an arbitrary amount of gaps through the -A hack can slow down PatMaN considerably and  cause  it
       to produce enormous amounts of output.  The use of some sorty of post-processor to filter these is highly
       recommended.

   Unknown Nucleotides
       Unknown nucleotides are most often encoded by the letter N.  If the --ambicodes option is not  given,  Ns
       in  patterns  are interpreted as unknown nucleotides and can never match without penalty.  If --ambicodes
       is given, Ns in patterns are expanded just like the other  amibuguity  codes,  and  effectively  work  as
       wildcards.  Unknown nucleotides can still be encoded by an X and will never match anything.  The database
       is treated differently in that anything other than A, C, G,  T  and  U,  including  ambiguity  codes,  is
       treated as unknown and can never match without penalty.

FILES

       /etc/popt
              The system wide configuration file for popt(3).  PatMaN identifies itself as "patman" to popt.

       ~/.popt
              Per user configuration file for popt(3).

BUGS

       None known.

AUTHOR

       Kay Pruefer <pruefer@eva.mpg.de>
       Udo Stenzel <udo_stenzel@eva.mpg.de>

SEE ALSO

       popt(3),fasta(5)