Ubuntu Manpage: hmmpgmd - daemon for searching a protein query against a protein database

NAME

       hmmpgmd - daemon for searching a protein query against a protein database

SYNOPSIS

       hmmpgmd [options]

DESCRIPTION

The hmmpgmd program is the daemon that we use internally for the hmmer.org web server, and
essentially stands in front of the protein search programs phmmer, hmmsearch, and hmmscan.

To use hmmpgmd, first an instance must be started up as a master server, and provided with
at least one of a sequence database (using the --seqdb flag) and/or an HMM database (using
the --hmmdb flag). A sequence database must be in the hmmpgmd format, which may be
produced using esl-reformat. An HMM database is of the form produced by hmmbuild. The
input database(s) will be loaded into memory by the master. When the master has finished
loading the database(s), it prints the line: "Data loaded into memory. Master is ready."

Only after master is ready, one or more instances of hmmpgmd may be started as workers.
These workers may be (and typically are) on different machines from the master, but must
have access to the same database file(s) provided to the master, with the same path. As
with the master, each worker loads the database(s) into memory, and indicates completion
by printing: "Data loaded into memory. Worker is ready."

The master server and workers are expected to remain running. One or more clients then
connect to the master and submit possibly many queries. The master distributes the work of
a query among the workers, collects results, and merges them before responding to the
client. Two example client programs are included in the HMMER3.1 src directory - the C
program hmmc2 and the perl script hmmpgmd_client_example.pl. These are intended as
examples only, and should be extended as necessary to meet your needs.

A query is submitted to the master from the client as a character string. Queries may be
the sort that would normally be handled by phmmer (protein sequence vs protein database),
hmmsearch (protein HMM query vs protein database), or hmmscan (protein query vs protein
HMM database).

The general form of a client query is to start with a single line of the form @[options],
followed by multiple lines of text representing either the query HMM or fasta-formatted
sequence. The final line of each query is the separator //.

For example, to perform a phmmer type search of a sequence against a sequence database
file, the first line is of the form @--seqdb 1, then the fasta-formatted query sequence
starting with the header line >sequence-name, followed by one or more lines of sequence,
and finally the closing //.

To perform an hmmsearch type search, the query sequence is replaced by the full text of a
HMMER-format query HMM.

To perform an hmmscan type search, the text matches that of the phmmer type search, except
that the first line changes to @--hmmdb 1.

In the hmmpgmd-formatted sequence database file, each sequence can be associated with one
or more sub-databases. The --seqdb flag indicates which of these sub-databases will be
queried. The HMM database format does not support sub-databases.

The result of each query is an undocumented data structure in binary format. In the future
the data will be returned in a proper serialized structure, but for now, it requires
meticulous unpacking within the client. The example clients show how this is done.

OPTIONS

       -h     Help; print a brief reminder of command line usage and all available options.

EXPERT OPTIONS

       --master
              Run as the master server.

       --worker <s>
              Run as a worker, connecting to the master server that is running on IP address <s>.

       --daemon
              Run as a daemon using config file: /etc/hmmpgmd.conf

       --cport <n>
              Port to use for communication between clients and the master server.   The  default
              is 51371.

       --wport <n>
              Port  to  use for communication between workers and the master server.  The default
              is 51372.

       --ccncts <n>
              Maximum number of client connections to accept. The default is 16.

       --wcncts <n>
              Maximum number of worker connections to accept. The default is 32.

       --pid <f>
              Name of file into which the process id will be written.

       --seqdb <f>
              Name of the file (in hmmpgmd format) containing protein sequences.  The contents of
              this file will be cached for searches.

       --hmmdb <f>
              Name  of the file containing protein HMMs. The contents of this file will be cached
              for searches.

       --cpu <n>
              Number of parallel threads to use (for --worker ).

COPYRIGHT

       Copyright (C) 2013 Howard Hughes Medical Institute.
       Freely distributed under the GNU General Public License (GPLv3).

       For additional information on copyright and licensing, see the file  called  COPYRIGHT  in
       your HMMER source distribution, or see the HMMER web page ().

AUTHOR

       Eddy/Rivas Laboratory
       Janelia Farm Research Campus
       19700 Helix Drive
       Ashburn VA 20147 USA
       http://eddylab.org