Provided by: pcp_5.0.3-1_amd64 bug

NAME

       pcp2spark - pcp-to-spark metrics exporter

SYNOPSIS

       pcp2spark   [-5CGHIjLnrRvV?]    [-4   action]   [-8|-9  limit]  [-a  archive]  [-A  align]
       [--archive-folio  folio]  [-b|-B  space-scale]   [-c   config]   [--container   container]
       [--daemonize]  [-e  derived]  [-g server] [-h host] [-i instances] [-J rank] [-K spec] [-N
       predicate] [-O origin] [-p port] [-P|-0 precision] [-q|-Q count-scale]  [-s  samples]  [-S
       starttime] [-t interval] [-T endtime] [-y|-Y time-scale] metricspec [...]

DESCRIPTION

       pcp2spark  is  a  customizable performance metrics exporter tool from PCP to Apache Spark.
       Any available performance metric, live or archived,  system  and/or  application,  can  be
       selected for exporting using either command line arguments or a configuration file.

       pcp2spark  acts as a bridge which provides a network socket stream on a given address/port
       which an Apache Spark worker task can connect to and pull the configured PCP metrics  from
       pcp2spark exporting them using the streaming extensions of the Apache Spark API.

       pcp2spark  is  a  close relative of pmrep(1).  Please refer to pmrep(1) for the metricspec
       description accepted on pcp2spark command line and pmrep.conf(5) for  description  of  the
       pcp2spark.conf  configuration  file overall syntax, this page describes pcp2spark specific
       options and configuration file differences with pmrep.conf(5).  pmrep(1) also  lists  some
       usage examples of which most are applicable with pcp2spark as well.

       Only  the command line options listed on this page are supported, other options recognized
       by pmrep(1) are not supported.

       Options via environment values (see pmGetOptions(3)) override the  corresponding  built-in
       default   values   (if  any).   Configuration  file  options  override  the  corresponding
       environment  variables  (if  any).   Command  line  options  override  the   corresponding
       configuration file options (if any).

GENERAL USAGE

       A  general  setup for making use of pcp2spark would involve the user configuring pcp2spark
       for the PCP metrics  to  export  followed  by  starting  the  pcp2spark  application.  The
       pcp2spark application will then wait and listen on the given address/port for a connection
       from an Apache Spark worker thread to be started.  The worker thread will then connect  to
       pcp2spark.

       When an Apache Spark worker thread has connected pcp2spark will begin streaming PCP metric
       data to Apache Spark until the worker thread completes or the connection  is  interrupted.
       If  the  connectionis  interrupted  or  the  socket is closed from the Apache Spark worker
       thread pcp2spark will exit.

       For an example Apache Spark worker job which will connect to an pcp2spark  instance  on  a
       given  address/port and pull in PCP metric data please see the example provided in the PCP
       examples directory for pcp2spark (often provided by the PCP development  package)  or  the
       online version at https://github.com/performancecopilot/pcp/blob/master/src/pcp2spark/.

CONFIGURATION FILE

       pcp2spark  uses  a configuration file with overall syntax described in pmrep.conf(5).  The
       following options are common with pmrep.conf: version, source, speclocal, derived, header,
       globals,  samples,  interval, type, type_prefer, ignore_incompat, names_change, instances,
       live_filter, rank, limit_filter, limit_filter_force, invert_filter, predicate,  omit_flat,
       precision,      precision_force,      count_scale,     count_scale_force,     space_scale,
       space_scale_force, time_scale, time_scale_force.  The  output  option  is  recognized  but
       ignored for pmrep.conf compatibility.

   pcp2spark specific options
       spark_server (string)
           Specify  the  address  on  which  pcp2spark will listen for connections from an Apache
           Spark worker thread.  Corresponding command line option is -g.  Default is 127.0.0.1.

       spark_port (integer)
           Specify the port to run pcp2spark  on.   Corresponding  command  line  option  is  -p.
           Default is 44325.

OPTIONS

       The available command line options are:

       -0 precision, --precision-force=precision
            Like -P but this option will override per-metric specifications.

       -4 action, --names-change=action
            Specify  which  action  to  take  on  receiving  a  metric  names change event during
            sampling.  These events occur when  a  PMDA  discovers  new  metrics  sometime  after
            starting  up,  and  informs  running  client  tools like pcp2spark.  Valid values for
            action are update (refresh metrics being sampled), ignore (do nothing -  the  default
            behaviour) and abort (exit the program if such an event happens).

       -5, --ignore-unknown
            Silently ignore any metric name that cannot be resolved.  At least one metric must be
            found for the tool to start.

       -8 limit, --limit-filter=limit
            Limit results to instances with values above/below limit.  A  positive  integer  will
            include instances with values at or above the limit in reporting.  A negative integer
            will include instances with values at or below the limit in reporting.   A  value  of
            zero  performs no limit filtering.  This option will not override possible per-metric
            specifications.  See also -J and -N.

       -9 limit, --limit-filter-force=limit
            Like -8 but this option will override per-metric specifications.

       -a archive, --archive=archive
            Performance metric values are retrieved from the set of  Performance  Co-Pilot  (PCP)
            archive log files identified by the archive argument, which is a comma-separated list
            of names, each of which may be the base name of an archive or the name of a directory
            containing one or more archives.

       -A align, --align=align
            Force  the initial sample to be aligned on the boundary of a natural time unit align.
            Refer to PCPIntro(1) for a complete description of the syntax for align.

       --archive-folio=folio
            Read metric source archives  from  the  PCP  archive  folio  created  by  tools  like
            pmchart(1) or, less often, manually with mkaf(1).

       -b scale, --space-scale=scale
            Unit/scale  for  space  (byte)  metrics,  possible  values include bytes, Kbytes, KB,
            Mbytes, MB, and  so  forth.   This  option  will  not  override  possible  per-metric
            specifications.  See also pmParseUnitsStr(3).

       -B scale, --space-scale-force=scale
            Like -b but this option will override per-metric specifications.

       -c config, --config=config
            Specify the config file to use.  The default is the first found of: ./pcp2spark.conf,
            $HOME/.pcp2spark.conf, $HOME/pcp/pcp2spark.conf, and $PCP_SYSCONF_DIR/pcp2spark.conf.
            For details, see the above section and pmrep.conf(5).

       --container=container
            Fetch  performance  metrics from the specified container, either local or remote (see
            -h).

       -C, --check
            Exit before reporting any values, but after parsing the configuration and metrics and
            printing possible headers.

       --daemonize
            Daemonize on startup.

       -e derived, --derived=derived
            Specify  derived performance metrics.  If derived starts with a slash (``/'') or with
            a dot (``.'') it will  be  interpreted  as  a  derived  metrics  configuration  file,
            otherwise  it  will  be  interpreted  as comma- or semicolon-separated derived metric
            expressions.  For details see pmLoadDerivedConfig(3) and pmRegisterDerived(3).

       -g server, --spark-server=server
            Spark server to send the metrics to.

       -G, --no-globals
            Do not include global metrics in reporting (see pmrep.conf(5)).

       -h host, --host=host
            Fetch performance metrics  from  pmcd(1)  on  host,  rather  than  from  the  default
            localhost.

       -H, --no-header
            Do not print any headers.

       -i instances, --instances=instances
            Report  only  the  listed instances from current instances (if present, see also -j).
            By default all instances, present and future, are reported.  This is a global  option
            that is used for all metrics unless a metric-specific instance definition is provided
            as part of a metricspec.  By default single-valued ``flat'' metrics without  multiple
            instances  are  still  reported  as  usual,  use  -v to change this.  Please refer to
            pmrep(1) for more details on this option.

       -I, --ignore-incompat
            Ignore incompatible metrics.  By default incompatible metrics (that is, their type is
            unsupported  or they cannot be scaled as requested) will cause pcp2spark to terminate
            with an error message.  With  this  option  all  incompatible  metrics  are  silently
            omitted from reporting.  This may be especially useful when requesting non-leaf nodes
            of the PMNS tree for reporting.

       -j, --live-filter
            Perform instance live filtering.  This allows capturing all filtered  instances  even
            if processes are restarted at some point (unlike without live filtering).  Performing
            live filtering over a huge amount of instances will add some internal overhead  so  a
            bit of user caution is advised.  See also -n.

       -J rank, --rank=rank
            Limit  results  to highest/lowest ranked instances of set-valued metrics.  A positive
            integer will include highest valued instances in reporting.  A negative integer  will
            include  lowest  valued instances in reporting.  A value of zero performs no ranking.
            Ranking does not imply sorting, see -6.  See also -8.

       -K spec, --spec-local=spec
            When fetching metrics from a local context (see -L), the -K option  may  be  used  to
            control  the DSO PMDAs that should be made accessible.  The spec argument conforms to
            the syntax described in pmSpecLocalPMDA(3).  More than one -K option may be used.

       -L, --local-PMDA
            Use a local context to collect metrics from DSO PMDAs on the local host without PMCD.
            See also -K.

       -n, --invert-filter
            Perform  ranking  before  live  filtering.   By default instance live filtering (when
            requested, see -j) happens before instance ranking (when requested,  see  -J).   With
            this option the logic is inverted and ranking happens before live filtering.

       -N predicate, --predicate=predicate
            Specify  a  comma-separated  list  of predicate filter reference metrics.  By default
            ranking (see -J) happens for each metric individually.  With predicates,  ranking  is
            done  only  for the specified predicate metrics.  When reporting, rest of the metrics
            sharing the same instance domain (see PCPIntro(1)) as the predicate will include only
            the  highest/lowest  ranking  instances of the corresponding predicate.  Ranking does
            not imply sorting, see -6.

            So for example, using proc.memory.rss  (resident  memory  size  of  process)  as  the
            predicate metric together with proc.io.total_bytes and mem.util.used as metrics to be
            reported, only the processes using most/least (as per -J)  memory  will  be  included
            when  reporting  total  bytes written by processes.  Since mem.util.used is a single-
            valued metric (thus not sharing the  same  instance  domain  as  the  process-related
            metrics), it will be reported as usual.

       -O origin, --origin=origin
            When  reporting  archived  metrics,  start reporting at origin within the time window
            (see -S and -T).  Refer to PCPIntro(1) for a complete description of the  syntax  for
            origin.

       -p port, --spark-port=port
            Spark server port.

       -P precision, --precision=precision
            Use precision for numeric non-integer output values.  The default is to use 3 decimal
            places  (when  applicable).   This  option  will  not  override  possible  per-metric
            specifications.

       -q scale, --count-scale=scale
            Unit/scale  for  count metrics, possible values include count x 10^-1, count, count x
            10, count x 10^2, and so forth from 10^-8  to  10^7.   (These  values  are  currently
            space-sensitive.)   This option will not override possible per-metric specifications.
            See also pmParseUnitsStr(3).

       -Q scale, --count-scale-force=scale
            Like -q but this option will override per-metric specifications.

       -r, --raw
            Output raw metric values, do not convert cumulative counters to rates.   This  option
            will override possible per-metric specifications.

       -R, --raw-prefer
            Like -r but this option will not override per-metric specifications.

       -s samples, --samples=samples
            The  samples argument defines the number of samples to be retrieved and reported.  If
            samples is 0 or -s is not specified, pcp2spark will sample  and  report  continuously
            (in  real  time  mode) or until the end of the set of PCP archives (in archive mode).
            See also -T.

       -S starttime, --start=starttime
            When reporting archived metrics, the report  will  be  restricted  to  those  records
            logged at or after starttime.  Refer to PCPIntro(1) for a complete description of the
            syntax for starttime.

       -t interval, --interval=interval
            Set the reporting interval to  something  other  than  the  default  1  second.   The
            interval  argument  follows  the syntax described in PCPIntro(1), and in the simplest
            form may be an unsigned integer (the implied units in this case  are  seconds).   See
            also the -T option.

       -T endtime, --finish=endtime
            When  reporting  archived  metrics,  the  report  will be restricted to those records
            logged before or at endtime.  Refer to PCPIntro(1) for a complete description of  the
            syntax for endtime.

            When  used  to  define the runtime before pcp2spark will exit, if no samples is given
            (see -s) then the number of reported  samples  depends  on  interval  (see  -t).   If
            samples  is given then interval will be adjusted to allow reporting of samples during
            runtime.  In case all of -T, -s, and -t are given, endtime determines the actual time
            pcp2spark will run.

       -v, --omit-flat
            Omit  single-valued ``flat'' metrics from reporting, only consider set-valued metrics
            (i.e., metrics with multiple values) for reporting.  See -i and -I.

       -V, --version
            Display version number and exit.

       -y scale, --time-scale=scale
            Unit/scale for time metrics, possible  values  include  nanosec,  ns,  microsec,  us,
            millisec,  ms,  and  so forth up to hour, hr.  This option will not override possible
            per-metric specifications.  See also pmParseUnitsStr(3).

       -Y scale, --time-scale-force=scale
            Like -y but this option will override per-metric specifications.

       -?, --help
            Display usage message and exit.

FILES

       pcp2spark.conf
            pcp2spark configuration file (see -c)

PCP ENVIRONMENT

       Environment variables with the prefix PCP_ are used to parameterize the file and directory
       names used by PCP.  On each installation, the file /etc/pcp.conf contains the local values
       for these variables.  The $PCP_CONF  variable  may  be  used  to  specify  an  alternative
       configuration file, as described in pcp.conf(5).

       For environment variables affecting PCP tools, see pmGetOptions(3).

SEE ALSO

       mkaf(1),  PCPIntro(1),  pcp(1),  pcp2elasticsearch(1),  pcp2graphite(1),  pcp2influxdb(1),
       pcp2json(1),  pcp2xlsx(1),  pcp2xml(1),  pcp2zabbix(1),  pmcd(1),   pminfo(1),   pmrep(1),
       pmGetOptions(3),     pmSpecLocalPMDA(3),    pmLoadDerivedConfig(3),    pmParseUnitsStr(3),
       pmRegisterDerived(3), LOGARCHIVE(5), pcp.conf(5), PMNS(5) and pmrep.conf(5).