Provided by: pcp_5.0.3-1_amd64 bug

NAME

       pcp2spark - pcp-to-spark metrics exporter

SYNOPSIS

       pcp2spark  [-5CGHIjLnrRvV?]   [-4  action]  [-8|-9 limit] [-a archive] [-A align] [--archive-folio folio]
       [-b|-B space-scale] [-c config] [--container container] [--daemonize] [-e derived] [-g server] [-h  host]
       [-i  instances]  [-J rank] [-K spec] [-N predicate] [-O origin] [-p port] [-P|-0 precision] [-q|-Q count-
       scale] [-s samples] [-S starttime] [-t interval] [-T endtime] [-y|-Y time-scale] metricspec [...]

DESCRIPTION

       pcp2spark is a customizable performance metrics exporter tool from PCP to Apache  Spark.   Any  available
       performance  metric,  live  or  archived,  system and/or application, can be selected for exporting using
       either command line arguments or a configuration file.

       pcp2spark acts as a bridge which provides a network socket stream on a given address/port which an Apache
       Spark worker task can connect to and pull the configured PCP metrics from pcp2spark exporting them  using
       the streaming extensions of the Apache Spark API.

       pcp2spark  is  a  close  relative  of  pmrep(1).  Please refer to pmrep(1) for the metricspec description
       accepted on pcp2spark command line and pmrep.conf(5) for description of the pcp2spark.conf  configuration
       file  overall  syntax,  this page describes pcp2spark specific options and configuration file differences
       with pmrep.conf(5).  pmrep(1) also lists some usage examples of which most are applicable with  pcp2spark
       as well.

       Only the command line options listed on this page are supported, other options recognized by pmrep(1) are
       not supported.

       Options  via  environment values (see pmGetOptions(3)) override the corresponding built-in default values
       (if any).  Configuration file options override the corresponding environment variables (if any).  Command
       line options override the corresponding configuration file options (if any).

GENERAL USAGE

       A general setup for making use of pcp2spark would involve the user  configuring  pcp2spark  for  the  PCP
       metrics  to  export  followed  by starting the pcp2spark application. The pcp2spark application will then
       wait and listen on the given address/port for a connection from an  Apache  Spark  worker  thread  to  be
       started.  The worker thread will then connect to pcp2spark.

       When an Apache Spark worker thread has connected pcp2spark will begin streaming PCP metric data to Apache
       Spark  until  the  worker  thread  completes  or  the  connection  is  interrupted.   If the connectionis
       interrupted or the socket is closed from the Apache Spark worker thread pcp2spark will exit.

       For an example Apache Spark  worker  job  which  will  connect  to  an  pcp2spark  instance  on  a  given
       address/port  and  pull  in PCP metric data please see the example provided in the PCP examples directory
       for  pcp2spark  (often  provided  by  the  PCP  development   package)   or   the   online   version   at
       https://github.com/performancecopilot/pcp/blob/master/src/pcp2spark/.

CONFIGURATION FILE

       pcp2spark  uses  a  configuration  file  with  overall  syntax described in pmrep.conf(5).  The following
       options are common with pmrep.conf:  version,  source,  speclocal,  derived,  header,  globals,  samples,
       interval,  type,  type_prefer, ignore_incompat, names_change, instances, live_filter, rank, limit_filter,
       limit_filter_force,  invert_filter,  predicate,  omit_flat,  precision,   precision_force,   count_scale,
       count_scale_force,  space_scale,  space_scale_force,  time_scale, time_scale_force.  The output option is
       recognized but ignored for pmrep.conf compatibility.

   pcp2spark specific options
       spark_server (string)
           Specify the address on which pcp2spark will listen  for  connections  from  an  Apache  Spark  worker
           thread.  Corresponding command line option is -g.  Default is 127.0.0.1.

       spark_port (integer)
           Specify the port to run pcp2spark on.  Corresponding command line option is -p.  Default is 44325.

OPTIONS

       The available command line options are:

       -0 precision, --precision-force=precision
            Like -P but this option will override per-metric specifications.

       -4 action, --names-change=action
            Specify which action to take on receiving a metric names change event during sampling.  These events
            occur when a PMDA discovers new metrics sometime after starting up, and informs running client tools
            like  pcp2spark.   Valid  values  for  action are update (refresh metrics being sampled), ignore (do
            nothing - the default behaviour) and abort (exit the program if such an event happens).

       -5, --ignore-unknown
            Silently ignore any metric name that cannot be resolved.  At least one metric must be found for  the
            tool to start.

       -8 limit, --limit-filter=limit
            Limit results to instances with values above/below limit.  A positive integer will include instances
            with  values  at  or  above  the limit in reporting.  A negative integer will include instances with
            values at or below the limit in reporting.  A value of  zero  performs  no  limit  filtering.   This
            option will not override possible per-metric specifications.  See also -J and -N.

       -9 limit, --limit-filter-force=limit
            Like -8 but this option will override per-metric specifications.

       -a archive, --archive=archive
            Performance metric values are retrieved from the set of Performance Co-Pilot (PCP) archive log files
            identified  by  the archive argument, which is a comma-separated list of names, each of which may be
            the base name of an archive or the name of a directory containing one or more archives.

       -A align, --align=align
            Force the initial sample to be aligned on the boundary of a  natural  time  unit  align.   Refer  to
            PCPIntro(1) for a complete description of the syntax for align.

       --archive-folio=folio
            Read  metric  source  archives  from the PCP archive folio created by tools like pmchart(1) or, less
            often, manually with mkaf(1).

       -b scale, --space-scale=scale
            Unit/scale for space (byte) metrics, possible values include bytes, Kbytes, KB, Mbytes, MB,  and  so
            forth.    This   option   will   not   override   possible   per-metric  specifications.   See  also
            pmParseUnitsStr(3).

       -B scale, --space-scale-force=scale
            Like -b but this option will override per-metric specifications.

       -c config, --config=config
            Specify  the  config  file  to  use.   The  default  is  the  first  found   of:   ./pcp2spark.conf,
            $HOME/.pcp2spark.conf,  $HOME/pcp/pcp2spark.conf, and $PCP_SYSCONF_DIR/pcp2spark.conf.  For details,
            see the above section and pmrep.conf(5).

       --container=container
            Fetch performance metrics from the specified container, either local or remote (see -h).

       -C, --check
            Exit before reporting any values, but after parsing  the  configuration  and  metrics  and  printing
            possible headers.

       --daemonize
            Daemonize on startup.

       -e derived, --derived=derived
            Specify  derived  performance metrics.  If derived starts with a slash (``/'') or with a dot (``.'')
            it will be interpreted as a derived metrics configuration file, otherwise it will be interpreted  as
            comma-  or  semicolon-separated  derived metric expressions.  For details see pmLoadDerivedConfig(3)
            and pmRegisterDerived(3).

       -g server, --spark-server=server
            Spark server to send the metrics to.

       -G, --no-globals
            Do not include global metrics in reporting (see pmrep.conf(5)).

       -h host, --host=host
            Fetch performance metrics from pmcd(1) on host, rather than from the default localhost.

       -H, --no-header
            Do not print any headers.

       -i instances, --instances=instances
            Report only the listed instances from current instances (if present, see also -j).  By  default  all
            instances,  present  and future, are reported.  This is a global option that is used for all metrics
            unless a metric-specific instance definition is provided  as  part  of  a  metricspec.   By  default
            single-valued  ``flat''  metrics  without  multiple instances are still reported as usual, use -v to
            change this.  Please refer to pmrep(1) for more details on this option.

       -I, --ignore-incompat
            Ignore incompatible metrics.  By default incompatible metrics (that is, their type is unsupported or
            they cannot be scaled as requested) will cause pcp2spark to terminate with an error  message.   With
            this  option  all  incompatible metrics are silently omitted from reporting.  This may be especially
            useful when requesting non-leaf nodes of the PMNS tree for reporting.

       -j, --live-filter
            Perform instance live filtering.  This allows capturing all filtered instances even if processes are
            restarted at some point (unlike without live filtering).  Performing  live  filtering  over  a  huge
            amount  of  instances will add some internal overhead so a bit of user caution is advised.  See also
            -n.

       -J rank, --rank=rank
            Limit results to highest/lowest ranked instances of set-valued metrics.   A  positive  integer  will
            include  highest  valued  instances  in  reporting.   A  negative integer will include lowest valued
            instances in reporting.  A value of zero performs no ranking.  Ranking does not imply  sorting,  see
            -6.  See also -8.

       -K spec, --spec-local=spec
            When  fetching  metrics  from a local context (see -L), the -K option may be used to control the DSO
            PMDAs that should be made accessible.  The  spec  argument  conforms  to  the  syntax  described  in
            pmSpecLocalPMDA(3).  More than one -K option may be used.

       -L, --local-PMDA
            Use a local context to collect metrics from DSO PMDAs on the local host without PMCD.  See also -K.

       -n, --invert-filter
            Perform  ranking before live filtering.  By default instance live filtering (when requested, see -j)
            happens before instance ranking (when requested, see -J).  With this option the  logic  is  inverted
            and ranking happens before live filtering.

       -N predicate, --predicate=predicate
            Specify  a  comma-separated list of predicate filter reference metrics.  By default ranking (see -J)
            happens for each metric individually.  With predicates, ranking  is  done  only  for  the  specified
            predicate  metrics.   When  reporting,  rest  of  the  metrics sharing the same instance domain (see
            PCPIntro(1)) as the predicate  will  include  only  the  highest/lowest  ranking  instances  of  the
            corresponding predicate.  Ranking does not imply sorting, see -6.

            So  for  example,  using  proc.memory.rss  (resident memory size of process) as the predicate metric
            together with proc.io.total_bytes and mem.util.used as metrics to be reported,  only  the  processes
            using  most/least  (as  per  -J)  memory  will  be  included  when  reporting total bytes written by
            processes.  Since mem.util.used is a single-valued metric (thus not sharing the same instance domain
            as the process-related metrics), it will be reported as usual.

       -O origin, --origin=origin
            When reporting archived metrics, start reporting at origin within the time window (see -S  and  -T).
            Refer to PCPIntro(1) for a complete description of the syntax for origin.

       -p port, --spark-port=port
            Spark server port.

       -P precision, --precision=precision
            Use  precision  for numeric non-integer output values.  The default is to use 3 decimal places (when
            applicable).  This option will not override possible per-metric specifications.

       -q scale, --count-scale=scale
            Unit/scale for count metrics, possible values include count x 10^-1, count,  count  x  10,  count  x
            10^2,  and  so forth from 10^-8 to 10^7.  (These values are currently space-sensitive.)  This option
            will not override possible per-metric specifications.  See also pmParseUnitsStr(3).

       -Q scale, --count-scale-force=scale
            Like -q but this option will override per-metric specifications.

       -r, --raw
            Output raw metric values, do not convert cumulative counters to rates.  This  option  will  override
            possible per-metric specifications.

       -R, --raw-prefer
            Like -r but this option will not override per-metric specifications.

       -s samples, --samples=samples
            The samples argument defines the number of samples to be retrieved and reported.  If samples is 0 or
            -s  is not specified, pcp2spark will sample and report continuously (in real time mode) or until the
            end of the set of PCP archives (in archive mode).  See also -T.

       -S starttime, --start=starttime
            When reporting archived metrics, the report will be restricted to those records logged at  or  after
            starttime.  Refer to PCPIntro(1) for a complete description of the syntax for starttime.

       -t interval, --interval=interval
            Set  the  reporting  interval  to  something other than the default 1 second.  The interval argument
            follows the syntax described in PCPIntro(1), and in the simplest form may  be  an  unsigned  integer
            (the implied units in this case are seconds).  See also the -T option.

       -T endtime, --finish=endtime
            When  reporting archived metrics, the report will be restricted to those records logged before or at
            endtime.  Refer to PCPIntro(1) for a complete description of the syntax for endtime.

            When used to define the runtime before pcp2spark will exit, if no samples is given (see -s) then the
            number of reported samples depends on interval (see -t).  If samples is given then interval will  be
            adjusted  to  allow  reporting  of samples during runtime.  In case all of -T, -s, and -t are given,
            endtime determines the actual time pcp2spark will run.

       -v, --omit-flat
            Omit single-valued ``flat'' metrics from reporting, only consider set-valued metrics (i.e.,  metrics
            with multiple values) for reporting.  See -i and -I.

       -V, --version
            Display version number and exit.

       -y scale, --time-scale=scale
            Unit/scale for time metrics, possible values include nanosec, ns, microsec, us, millisec, ms, and so
            forth  up  to hour, hr.  This option will not override possible per-metric specifications.  See also
            pmParseUnitsStr(3).

       -Y scale, --time-scale-force=scale
            Like -y but this option will override per-metric specifications.

       -?, --help
            Display usage message and exit.

FILES

       pcp2spark.conf
            pcp2spark configuration file (see -c)

PCP ENVIRONMENT

       Environment variables with the prefix PCP_ are used to parameterize the file and directory names used  by
       PCP.   On  each  installation, the file /etc/pcp.conf contains the local values for these variables.  The
       $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(5).

       For environment variables affecting PCP tools, see pmGetOptions(3).

SEE ALSO

       mkaf(1),  PCPIntro(1),  pcp(1),  pcp2elasticsearch(1),  pcp2graphite(1),  pcp2influxdb(1),   pcp2json(1),
       pcp2xlsx(1),     pcp2xml(1),    pcp2zabbix(1),    pmcd(1),    pminfo(1),    pmrep(1),    pmGetOptions(3),
       pmSpecLocalPMDA(3),  pmLoadDerivedConfig(3),  pmParseUnitsStr(3),  pmRegisterDerived(3),   LOGARCHIVE(5),
       pcp.conf(5), PMNS(5) and pmrep.conf(5).

Performance Co-Pilot                                   PCP                                          PCP2SPARK(1)