Ubuntu Manpage: pcp2spark - pcp-to-spark metrics exporter

Provided by: pcp-export-pcp2spark_5.3.7-1_amd64

NAME

       pcp2spark - pcp-to-spark metrics exporter

SYNOPSIS

       pcp2spark   [-5CGHIjLmnrRvV?]    [-4   action]  [-8|-9  limit]  [-a  archive]  [-A  align]
       [--archive-folio  folio]  [-b|-B  space-scale]   [-c   config]   [--container   container]
       [--daemonize]  [-e  derived]  [-g server] [-h host] [-i instances] [-J rank] [-K spec] [-N
       predicate] [-O origin] [-p port] [-P|-0 precision] [-q|-Q count-scale]  [-s  samples]  [-S
       starttime] [-t interval] [-T endtime] [-y|-Y time-scale] metricspec [...]

DESCRIPTION

pcp2spark is a customizable performance metrics exporter tool from PCP to Apache Spark.
Any available performance metric, live or archived, system and/or application, can be
selected for exporting using either command line arguments or a configuration file.

pcp2spark acts as a bridge which provides a network socket stream on a given address/port
which an Apache Spark worker task can connect to and pull the configured PCP metrics from
pcp2spark exporting them using the streaming extensions of the Apache Spark API.

pcp2spark is a close relative of pmrep(1). Refer to pmrep(1) for the metricspec
description accepted on pcp2spark command line. See pmrep.conf(5) for description of the
pcp2spark.conf configuration file overall syntax. This page describes pcp2spark specific
options and configuration file differences with pmrep.conf(5). pmrep(1) also lists some
usage examples of which most are applicable with pcp2spark as well.

Only the command line options listed on this page are supported, other options recognized
by pmrep(1) are not supported.

Options via environment values (see pmGetOptions(3)) override the corresponding built-in
default values (if any). Configuration file options override the corresponding
environment variables (if any). Command line options override the corresponding
configuration file options (if any).

GENERAL USAGE

       A  general  setup for making use of pcp2spark would involve the user configuring pcp2spark
       for the PCP metrics  to  export  followed  by  starting  the  pcp2spark  application.  The
       pcp2spark application will then wait and listen on the given address/port for a connection
       from an Apache Spark worker thread to be started.  The worker thread will then connect  to
       pcp2spark.

       When an Apache Spark worker thread has connected pcp2spark will begin streaming PCP metric
       data to Apache Spark until the worker thread completes or the connection  is  interrupted.
       If  the  connectionis  interrupted  or  the  socket is closed from the Apache Spark worker
       thread pcp2spark will exit.

       For an example Apache Spark worker job which will connect to an pcp2spark  instance  on  a
       given  address/port  and  pull  in  PCP  metric  data  see the example provided in the PCP
       examples directory for pcp2spark (often provided by the PCP development  package)  or  the
       online version at https://github.com/performancecopilot/pcp/blob/main/src/pcp2spark/.

CONFIGURATION FILE

       pcp2spark  uses  a configuration file with overall syntax described in pmrep.conf(5).  The
       following options are common with pmrep.conf: version, source, speclocal, derived, header,
       globals,  samples,  interval, type, type_prefer, ignore_incompat, names_change, instances,
       live_filter, rank, limit_filter, limit_filter_force, invert_filter, predicate,  omit_flat,
       include_labels,  precision,  precision_force, count_scale, count_scale_force, space_scale,
       space_scale_force, time_scale, time_scale_force.  The  output  option  is  recognized  but
       ignored for pmrep.conf compatibility.

   pcp2spark specific options
       spark_server (string)
           Specify  the  address  on  which  pcp2spark will listen for connections from an Apache
           Spark worker thread.  Corresponding command line option is -g.  Default is 127.0.0.1.

       spark_port (integer)
           Specify the port to run pcp2spark  on.   Corresponding  command  line  option  is  -p.
           Default is 44325.

OPTIONS

The available command line options are:

-0 precision, --precision-force=precision
Like -P but this option will override per-metric specifications.

-4 action, --names-change=action
Specify which action to take on receiving a metric names change event during
sampling. These events occur when a PMDA discovers new metrics sometime after
starting up, and informs running client tools like pcp2spark. Valid values for
action are update (refresh metrics being sampled), ignore (do nothing - the default
behaviour) and abort (exit the program if such an event happens).

-5, --ignore-unknown
Silently ignore any metric name that cannot be resolved. At least one metric must be
found for the tool to start.

-8 limit, --limit-filter=limit
Limit results to instances with values above/below limit. A positive integer will
include instances with values at or above the limit in reporting. A negative integer
will include instances with values at or below the limit in reporting. A value of
zero performs no limit filtering. This option will not override possible per-metric
specifications. See also -J and -N.

-9 limit, --limit-filter-force=limit
Like -8 but this option will override per-metric specifications.

-a archive, --archive=archive
Performance metric values are retrieved from the set of Performance Co-Pilot (PCP)
archive log files identified by the archive argument, which is a comma-separated list
of names, each of which may be the base name of an archive or the name of a directory
containing one or more archives.

-A align, --align=align
Force the initial sample to be aligned on the boundary of a natural time unit align.
Refer to PCPIntro(1) for a complete description of the syntax for align.

--archive-folio=folio
Read metric source archives from the PCP archive folio created by tools like
pmchart(1) or, less often, manually with mkaf(1).

-b scale, --space-scale=scale
Unit/scale for space (byte) metrics, possible values include bytes, Kbytes, KB,
Mbytes, MB, and so forth. This option will not override possible per-metric
specifications. See also pmParseUnitsStr(3).

-B scale, --space-scale-force=scale
Like -b but this option will override per-metric specifications.

-c config, --config=config
Specify the config file or directory to use. In case config is a directory all files
under it ending .conf will be included. The default is the first found of:
./pcp2spark.conf, $HOME/.pcp2spark.conf, $HOME/pcp/pcp2spark.conf, and
$PCP_SYSCONF_DIR/pcp2spark.conf. For details, see the above section and
pmrep.conf(5).

--container=container
Fetch performance metrics from the specified container, either local or remote (see
-h).

-C, --check
Exit before reporting any values, but after parsing the configuration and metrics and
printing possible headers.

--daemonize
Daemonize on startup.

-e derived, --derived=derived
Specify derived performance metrics. If derived starts with a slash (``/'') or with
a dot (``.'') it will be interpreted as a derived metrics configuration file,
otherwise it will be interpreted as comma- or semicolon-separated derived metric
expressions. For details see pmLoadDerivedConfig(3) and pmRegisterDerived(3).

-g server, --spark-server=server
Spark server to send the metrics to.

-G, --no-globals
Do not include global metrics in reporting (see pmrep.conf(5)).

-h host, --host=host
Fetch performance metrics from pmcd(1) on host, rather than from the default
localhost.

-H, --no-header
Do not print any headers.

-i instances, --instances=instances
Retrieve and report only the specified metric instances. By default all instances,
present and future, are reported.

Refer to pmrep(1) for complete description of this option.

-I, --ignore-incompat
Ignore incompatible metrics. By default incompatible metrics (that is, their type is
unsupported or they cannot be scaled as requested) will cause pcp2spark to terminate
with an error message. With this option all incompatible metrics are silently
omitted from reporting. This may be especially useful when requesting non-leaf nodes
of the PMNS tree for reporting.

-j, --live-filter
Perform instance live filtering. This allows capturing all named instances even if
processes are restarted at some point (unlike without live filtering). Performing
live filtering over a huge number of instances will add some internal overhead so a
bit of user caution is advised. See also -n.

-J rank, --rank=rank
Limit results to highest/lowest ranked instances of set-valued metrics. A positive
integer will include highest valued instances in reporting. A negative integer will
include lowest valued instances in reporting. A value of zero performs no ranking.
Ranking does not imply sorting, see -6. See also -8.

-K spec, --spec-local=spec
When fetching metrics from a local context (see -L), the -K option may be used to
control the DSO PMDAs that should be made accessible. The spec argument conforms to
the syntax described in pmSpecLocalPMDA(3). More than one -K option may be used.

-L, --local-PMDA
Use a local context to collect metrics from DSO PMDAs on the local host without PMCD.
See also -K.

-m, --include-labels
Include metric labels in the output.

-n, --invert-filter
Perform ranking before live filtering. By default instance live filtering (when
requested, see -j) happens before instance ranking (when requested, see -J). With
this option the logic is inverted and ranking happens before live filtering.

-N predicate, --predicate=predicate
Specify a comma-separated list of predicate filter reference metrics. By default
ranking (see -J) happens for each metric individually. With predicates, ranking is
done only for the specified predicate metrics. When reporting, rest of the metrics
sharing the same instance domain (see PCPIntro(1)) as the predicate will include only
the highest/lowest ranking instances of the corresponding predicate. Ranking does
not imply sorting, see -6.

So for example, using proc.memory.rss (resident memory size of process) as the
predicate metric together with proc.io.total_bytes and mem.util.used as metrics to be
reported, only the processes using most/least (as per -J) memory will be included
when reporting total bytes written by processes. Since mem.util.used is a single-
valued metric (thus not sharing the same instance domain as the process related
metrics), it will be reported as usual.

-O origin, --origin=origin
When reporting archived metrics, start reporting at origin within the time window
(see -S and -T). Refer to PCPIntro(1) for a complete description of the syntax for
origin.

-p port, --spark-port=port
Spark server port.

-P precision, --precision=precision
Use precision for numeric non-integer output values. The default is to use 3 decimal
places (when applicable). This option will not override possible per-metric
specifications.

-q scale, --count-scale=scale
Unit/scale for count metrics, possible values include count x 10^-1, count, count x
10, count x 10^2, and so forth from 10^-8 to 10^7. (These values are currently
space-sensitive.) This option will not override possible per-metric specifications.
See also pmParseUnitsStr(3).

-Q scale, --count-scale-force=scale
Like -q but this option will override per-metric specifications.

-r, --raw
Output raw metric values, do not convert cumulative counters to rates. This option
will override possible per-metric specifications.

-R, --raw-prefer
Like -r but this option will not override per-metric specifications.

-s samples, --samples=samples
The samples argument defines the number of samples to be retrieved and reported. If
samples is 0 or -s is not specified, pcp2spark will sample and report continuously
(in real time mode) or until the end of the set of PCP archives (in archive mode).
See also -T.

-S starttime, --start=starttime
When reporting archived metrics, the report will be restricted to those records
logged at or after starttime. Refer to PCPIntro(1) for a complete description of the
syntax for starttime.

-t interval, --interval=interval
Set the reporting interval to something other than the default 1 second. The
interval argument follows the syntax described in PCPIntro(1), and in the simplest
form may be an unsigned integer (the implied units in this case are seconds). See
also the -T option.

-T endtime, --finish=endtime
When reporting archived metrics, the report will be restricted to those records
logged before or at endtime. Refer to PCPIntro(1) for a complete description of the
syntax for endtime.

When used to define the runtime before pcp2spark will exit, if no samples is given
(see -s) then the number of reported samples depends on interval (see -t). If
samples is given then interval will be adjusted to allow reporting of samples during
runtime. In case all of -T, -s, and -t are given, endtime determines the actual time
pcp2spark will run.

-v, --omit-flat
Report only set-valued metrics with instances (e.g. disk.dev.read) and omit single-
valued ``flat'' metrics without instances (e.g. kernel.all.sysfork). See -i and -I.

-V, --version
Display version number and exit.

-y scale, --time-scale=scale
Unit/scale for time metrics, possible values include nanosec, ns, microsec, us,
millisec, ms, and so forth up to hour, hr. This option will not override possible
per-metric specifications. See also pmParseUnitsStr(3).

-Y scale, --time-scale-force=scale
Like -y but this option will override per-metric specifications.

-?, --help
Display usage message and exit.

FILES

       pcp2spark.conf
            pcp2spark configuration file (see -c)

PCP ENVIRONMENT

       Environment variables with the prefix PCP_ are used to parameterize the file and directory
       names used by PCP.  On each installation, the file /etc/pcp.conf contains the local values
       for these variables.  The $PCP_CONF  variable  may  be  used  to  specify  an  alternative
       configuration file, as described in pcp.conf(5).

       For environment variables affecting PCP tools, see pmGetOptions(3).