Provided by: pcp_5.0.3-1_amd64 bug

NAME

       pcp2spark - pcp-to-spark metrics exporter

SYNOPSIS

       pcp2spark  [-5CGHIjLnrRvV?]   [-4  action]  [-8|-9 limit] [-a archive] [-A align] [--archive-folio folio]
       [-b|-B space-scale] [-c config] [--container container] [--daemonize] [-e derived] [-g server] [-h  host]
       [-i  instances]  [-J rank] [-K spec] [-N predicate] [-O origin] [-p port] [-P|-0 precision] [-q|-Q count-
       scale] [-s samples] [-S starttime] [-t interval] [-T endtime] [-y|-Y time-scale] metricspec [...]

DESCRIPTION

       pcp2spark is a customizable performance metrics exporter tool from PCP to Apache  Spark.   Any  available
       performance  metric,  live  or  archived,  system and/or application, can be selected for exporting using
       either command line arguments or a configuration file.

       pcp2spark acts as a bridge which provides a network socket stream on a given address/port which an Apache
       Spark  worker task can connect to and pull the configured PCP metrics from pcp2spark exporting them using
       the streaming extensions of the Apache Spark API.

       pcp2spark is a close relative of pmrep(1).  Please refer  to  pmrep(1)  for  the  metricspec  description
       accepted  on pcp2spark command line and pmrep.conf(5) for description of the pcp2spark.conf configuration
       file overall syntax, this page describes pcp2spark specific options and  configuration  file  differences
       with  pmrep.conf(5).  pmrep(1) also lists some usage examples of which most are applicable with pcp2spark
       as well.

       Only the command line options listed on this page are supported, other options recognized by pmrep(1) are
       not supported.

       Options  via  environment values (see pmGetOptions(3)) override the corresponding built-in default values
       (if any).  Configuration file options override the corresponding environment variables (if any).  Command
       line options override the corresponding configuration file options (if any).

GENERAL USAGE

       A  general  setup  for  making  use of pcp2spark would involve the user configuring pcp2spark for the PCP
       metrics to export followed by starting the pcp2spark application. The  pcp2spark  application  will  then
       wait  and  listen  on  the  given  address/port for a connection from an Apache Spark worker thread to be
       started.  The worker thread will then connect to pcp2spark.

       When an Apache Spark worker thread has connected pcp2spark will begin streaming PCP metric data to Apache
       Spark  until  the  worker  thread  completes  or  the  connection  is  interrupted.   If the connectionis
       interrupted or the socket is closed from the Apache Spark worker thread pcp2spark will exit.

       For an example Apache Spark  worker  job  which  will  connect  to  an  pcp2spark  instance  on  a  given
       address/port  and  pull  in PCP metric data please see the example provided in the PCP examples directory
       for  pcp2spark  (often  provided  by  the  PCP  development   package)   or   the   online   version   at
       https://github.com/performancecopilot/pcp/blob/master/src/pcp2spark/.

CONFIGURATION FILE

       pcp2spark  uses  a  configuration  file  with  overall  syntax described in pmrep.conf(5).  The following
       options are common with pmrep.conf:  version,  source,  speclocal,  derived,  header,  globals,  samples,
       interval,  type,  type_prefer, ignore_incompat, names_change, instances, live_filter, rank, limit_filter,
       limit_filter_force,  invert_filter,  predicate,  omit_flat,  precision,   precision_force,   count_scale,
       count_scale_force,  space_scale,  space_scale_force,  time_scale, time_scale_force.  The output option is
       recognized but ignored for pmrep.conf compatibility.

   pcp2spark specific options
       spark_server (string)
           Specify the address on which pcp2spark will listen  for  connections  from  an  Apache  Spark  worker
           thread.  Corresponding command line option is -g.  Default is 127.0.0.1.

       spark_port (integer)
           Specify the port to run pcp2spark on.  Corresponding command line option is -p.  Default is 44325.

OPTIONS

       The available command line options are:

       -0 precision, --precision-force=precision
            Like -P but this option will override per-metric specifications.

       -4 action, --names-change=action
            Specify which action to take on receiving a metric names change event during sampling.  These events
            occur when a PMDA discovers new metrics sometime after starting up, and informs running client tools
            like  pcp2spark.   Valid  values  for  action are update (refresh metrics being sampled), ignore (do
            nothing - the default behaviour) and abort (exit the program if such an event happens).

       -5, --ignore-unknown
            Silently ignore any metric name that cannot be resolved.  At least one metric must be found for  the
            tool to start.

       -8 limit, --limit-filter=limit
            Limit results to instances with values above/below limit.  A positive integer will include instances
            with values at or above the limit in reporting.  A negative  integer  will  include  instances  with
            values  at  or  below  the  limit  in reporting.  A value of zero performs no limit filtering.  This
            option will not override possible per-metric specifications.  See also -J and -N.

       -9 limit, --limit-filter-force=limit
            Like -8 but this option will override per-metric specifications.

       -a archive, --archive=archive
            Performance metric values are retrieved from the set of Performance Co-Pilot (PCP) archive log files
            identified  by  the archive argument, which is a comma-separated list of names, each of which may be
            the base name of an archive or the name of a directory containing one or more archives.

       -A align, --align=align
            Force the initial sample to be aligned on the boundary of a  natural  time  unit  align.   Refer  to
            PCPIntro(1) for a complete description of the syntax for align.

       --archive-folio=folio
            Read  metric  source  archives  from the PCP archive folio created by tools like pmchart(1) or, less
            often, manually with mkaf(1).

       -b scale, --space-scale=scale
            Unit/scale for space (byte) metrics, possible values include bytes, Kbytes, KB, Mbytes, MB,  and  so
            forth.    This   option   will   not   override   possible   per-metric  specifications.   See  also
            pmParseUnitsStr(3).

       -B scale, --space-scale-force=scale
            Like -b but this option will override per-metric specifications.

       -c config, --config=config
            Specify  the  config  file  to  use.   The  default  is  the  first  found   of:   ./pcp2spark.conf,
            $HOME/.pcp2spark.conf,  $HOME/pcp/pcp2spark.conf, and $PCP_SYSCONF_DIR/pcp2spark.conf.  For details,
            see the above section and pmrep.conf(5).

       --container=container
            Fetch performance metrics from the specified container, either local or remote (see -h).

       -C, --check
            Exit before reporting any values, but after parsing  the  configuration  and  metrics  and  printing
            possible headers.

       --daemonize
            Daemonize on startup.

       -e derived, --derived=derived
            Specify  derived  performance metrics.  If derived starts with a slash (``/'') or with a dot (``.'')
            it will be interpreted as a derived metrics configuration file, otherwise it will be interpreted  as
            comma-  or  semicolon-separated  derived metric expressions.  For details see pmLoadDerivedConfig(3)
            and pmRegisterDerived(3).

       -g server, --spark-server=server
            Spark server to send the metrics to.

       -G, --no-globals
            Do not include global metrics in reporting (see pmrep.conf(5)).

       -h host, --host=host
            Fetch performance metrics from pmcd(1) on host, rather than from the default localhost.

       -H, --no-header
            Do not print any headers.

       -i instances, --instances=instances
            Report only the listed instances from current instances (if present, see also -j).  By  default  all
            instances,  present  and future, are reported.  This is a global option that is used for all metrics
            unless a metric-specific instance definition is provided  as  part  of  a  metricspec.   By  default
            single-valued  ``flat''  metrics  without  multiple instances are still reported as usual, use -v to
            change this.  Please refer to pmrep(1) for more details on this option.

       -I, --ignore-incompat
            Ignore incompatible metrics.  By default incompatible metrics (that is, their type is unsupported or
            they  cannot  be scaled as requested) will cause pcp2spark to terminate with an error message.  With
            this option all incompatible metrics are silently omitted from reporting.  This  may  be  especially
            useful when requesting non-leaf nodes of the PMNS tree for reporting.

       -j, --live-filter
            Perform instance live filtering.  This allows capturing all filtered instances even if processes are
            restarted at some point (unlike without live filtering).  Performing  live  filtering  over  a  huge
            amount  of  instances will add some internal overhead so a bit of user caution is advised.  See also
            -n.

       -J rank, --rank=rank
            Limit results to highest/lowest ranked instances of set-valued metrics.   A  positive  integer  will
            include  highest  valued  instances  in  reporting.   A  negative integer will include lowest valued
            instances in reporting.  A value of zero performs no ranking.  Ranking does not imply  sorting,  see
            -6.  See also -8.

       -K spec, --spec-local=spec
            When  fetching  metrics  from a local context (see -L), the -K option may be used to control the DSO
            PMDAs that should be made accessible.  The  spec  argument  conforms  to  the  syntax  described  in
            pmSpecLocalPMDA(3).  More than one -K option may be used.

       -L, --local-PMDA
            Use a local context to collect metrics from DSO PMDAs on the local host without PMCD.  See also -K.

       -n, --invert-filter
            Perform  ranking before live filtering.  By default instance live filtering (when requested, see -j)
            happens before instance ranking (when requested, see -J).  With this option the  logic  is  inverted
            and ranking happens before live filtering.

       -N predicate, --predicate=predicate
            Specify  a  comma-separated list of predicate filter reference metrics.  By default ranking (see -J)
            happens for each metric individually.  With predicates, ranking  is  done  only  for  the  specified
            predicate  metrics.   When  reporting,  rest  of  the  metrics sharing the same instance domain (see
            PCPIntro(1)) as the predicate  will  include  only  the  highest/lowest  ranking  instances  of  the
            corresponding predicate.  Ranking does not imply sorting, see -6.

            So  for  example,  using  proc.memory.rss  (resident memory size of process) as the predicate metric
            together with proc.io.total_bytes and mem.util.used as metrics to be reported,  only  the  processes
            using  most/least  (as  per  -J)  memory  will  be  included  when  reporting total bytes written by
            processes.  Since mem.util.used is a single-valued metric (thus not sharing the same instance domain
            as the process-related metrics), it will be reported as usual.

       -O origin, --origin=origin
            When  reporting  archived metrics, start reporting at origin within the time window (see -S and -T).
            Refer to PCPIntro(1) for a complete description of the syntax for origin.

       -p port, --spark-port=port
            Spark server port.

       -P precision, --precision=precision
            Use precision for numeric non-integer output values.  The default is to use 3 decimal  places  (when
            applicable).  This option will not override possible per-metric specifications.

       -q scale, --count-scale=scale
            Unit/scale  for  count  metrics,  possible  values include count x 10^-1, count, count x 10, count x
            10^2, and so forth from 10^-8 to 10^7.  (These values are currently space-sensitive.)   This  option
            will not override possible per-metric specifications.  See also pmParseUnitsStr(3).

       -Q scale, --count-scale-force=scale
            Like -q but this option will override per-metric specifications.

       -r, --raw
            Output  raw  metric  values, do not convert cumulative counters to rates.  This option will override
            possible per-metric specifications.

       -R, --raw-prefer
            Like -r but this option will not override per-metric specifications.

       -s samples, --samples=samples
            The samples argument defines the number of samples to be retrieved and reported.  If samples is 0 or
            -s  is not specified, pcp2spark will sample and report continuously (in real time mode) or until the
            end of the set of PCP archives (in archive mode).  See also -T.

       -S starttime, --start=starttime
            When reporting archived metrics, the report will be restricted to those records logged at  or  after
            starttime.  Refer to PCPIntro(1) for a complete description of the syntax for starttime.

       -t interval, --interval=interval
            Set  the  reporting  interval  to  something other than the default 1 second.  The interval argument
            follows the syntax described in PCPIntro(1), and in the simplest form may  be  an  unsigned  integer
            (the implied units in this case are seconds).  See also the -T option.

       -T endtime, --finish=endtime
            When  reporting archived metrics, the report will be restricted to those records logged before or at
            endtime.  Refer to PCPIntro(1) for a complete description of the syntax for endtime.

            When used to define the runtime before pcp2spark will exit, if no samples is given (see -s) then the
            number  of reported samples depends on interval (see -t).  If samples is given then interval will be
            adjusted to allow reporting of samples during runtime.  In case all of -T, -s,  and  -t  are  given,
            endtime determines the actual time pcp2spark will run.

       -v, --omit-flat
            Omit  single-valued ``flat'' metrics from reporting, only consider set-valued metrics (i.e., metrics
            with multiple values) for reporting.  See -i and -I.

       -V, --version
            Display version number and exit.

       -y scale, --time-scale=scale
            Unit/scale for time metrics, possible values include nanosec, ns, microsec, us, millisec, ms, and so
            forth  up  to hour, hr.  This option will not override possible per-metric specifications.  See also
            pmParseUnitsStr(3).

       -Y scale, --time-scale-force=scale
            Like -y but this option will override per-metric specifications.

       -?, --help
            Display usage message and exit.

FILES

       pcp2spark.conf
            pcp2spark configuration file (see -c)

PCP ENVIRONMENT

       Environment variables with the prefix PCP_ are used to parameterize the file and directory names used  by
       PCP.   On  each  installation, the file /etc/pcp.conf contains the local values for these variables.  The
       $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(5).

       For environment variables affecting PCP tools, see pmGetOptions(3).

SEE ALSO

       mkaf(1),  PCPIntro(1),  pcp(1),  pcp2elasticsearch(1),  pcp2graphite(1),  pcp2influxdb(1),   pcp2json(1),
       pcp2xlsx(1),     pcp2xml(1),    pcp2zabbix(1),    pmcd(1),    pminfo(1),    pmrep(1),    pmGetOptions(3),
       pmSpecLocalPMDA(3),  pmLoadDerivedConfig(3),  pmParseUnitsStr(3),  pmRegisterDerived(3),   LOGARCHIVE(5),
       pcp.conf(5), PMNS(5) and pmrep.conf(5).