Provided by: pcp-export-pcp2spark_6.3.1-1_amd64 bug

NAME

       pcp2spark - pcp-to-spark metrics exporter

SYNOPSIS

       pcp2spark   [-5CGHIjLmnrRvV?]    [-4   action]  [-8|-9  limit]  [-a  archive]  [-A  align]
       [--archive-folio  folio]  [-b|-B  space-scale]   [-c   config]   [--container   container]
       [--daemonize]  [-e  derived]  [-g server] [-h host] [-i instances] [-J rank] [-K spec] [-N
       predicate] [-O origin] [-p port] [-P|-0 precision] [-q|-Q count-scale]  [-s  samples]  [-S
       starttime] [-t interval] [-T endtime] [-y|-Y time-scale] metricspec [...]

DESCRIPTION

       pcp2spark  is  a  customizable performance metrics exporter tool from PCP to Apache Spark.
       Any available performance metric, live or archived,  system  and/or  application,  can  be
       selected for exporting using either command line arguments or a configuration file.

       pcp2spark  acts as a bridge which provides a network socket stream on a given address/port
       which an Apache Spark worker task can connect to and pull the configured PCP metrics  from
       pcp2spark exporting them using the streaming extensions of the Apache Spark API.

       pcp2spark  is  a  close  relative  of  pmrep(1).   Refer  to  pmrep(1)  for the metricspec
       description accepted on pcp2spark command line.  See pmrep.conf(5) for description of  the
       pcp2spark.conf  configuration file syntax.  This page describes pcp2spark specific options
       and configuration file differences with pmrep.conf(5).  pmrep(1)  also  lists  some  usage
       examples of which most are applicable with pcp2spark as well.

       Only  the  command line options listed on this page are supported, other options available
       for pmrep(1) are not supported.

       Options via environment values (see pmGetOptions(3)) override the  corresponding  built-in
       default   values   (if  any).   Configuration  file  options  override  the  corresponding
       environment  variables  (if  any).   Command  line  options  override  the   corresponding
       configuration file options (if any).

GENERAL USAGE

       A  general  setup for making use of pcp2spark would involve the user configuring pcp2spark
       for the PCP metrics  to  export  followed  by  starting  the  pcp2spark  application.  The
       pcp2spark application will then wait and listen on the given address/port for a connection
       from an Apache Spark worker thread to be started.  The worker thread will then connect  to
       pcp2spark.

       When  an  Apache  Spark  worker  thread  has connected, pcp2spark will begin streaming PCP
       metric data to Apache Spark until  the  worker  thread  completes  or  the  connection  is
       interrupted.   If  the  connection  is interrupted or the socket is closed from the Apache
       Spark worker thread pcp2spark will exit.

       For an example Apache Spark worker job which will connect to an pcp2spark  instance  on  a
       given  address/port  and  pull  in  PCP  metric  data  see the example provided in the PCP
       examples directory for pcp2spark (often provided by the PCP development  package)  or  the
       online version at https://github.com/performancecopilot/pcp/blob/main/src/pcp2spark/.

CONFIGURATION FILE

       pcp2spark uses a configuration file with syntax described in pmrep.conf(5).  The following
       options are common with pmrep.conf: version, source, speclocal, derived, header,  globals,
       samples,   interval,   type,   type_prefer,   ignore_incompat,   names_change,  instances,
       live_filter, rank, limit_filter, limit_filter_force, invert_filter, predicate,  omit_flat,
       include_labels,  precision,  precision_force, count_scale, count_scale_force, space_scale,
       space_scale_force, time_scale, time_scale_force.  The rest of the pmrep.conf  options  are
       recognized but ignored for compatibility.

   pcp2spark specific options
       spark_server (string)
           Specify  the  address  on  which  pcp2spark will listen for connections from an Apache
           Spark worker thread.  Corresponding command line option is -g.  Defaults to 127.0.0.1.

       spark_port (integer)
           Specify the port on  which  pcp2spark  will  listen  for  connections.   Corresponding
           command line option is -p.  Defaults to 44325.

OPTIONS

       The available command line options are:

       -0 precision, --precision-force=precision
            Like -P but this option will override per-metric specifications.

       -4 action, --names-change=action
            Specify  which  action  to  take  on  receiving  a  metric  names change event during
            sampling.  These events occur when  a  PMDA  discovers  new  metrics  sometime  after
            starting  up,  and  informs  running  client  tools like pcp2spark.  Valid values for
            action are update (refresh metrics being sampled), ignore (do nothing -  the  default
            behaviour) and abort (exit the program if such an event occurs).

       -5, --ignore-unknown
            Silently ignore any metric name that cannot be resolved.  At least one metric must be
            found for the tool to start.

       -8 limit, --limit-filter=limit
            Limit results to instances with values above/below limit.  A  positive  integer  will
            include instances with values at or above the limit in reporting.  A negative integer
            will include instances with values at or below the limit in reporting.   A  value  of
            zero  performs no limit filtering.  This option will not override possible per-metric
            specifications.  See also -J and -N.

       -9 limit, --limit-filter-force=limit
            Like -8 but this option will override per-metric specifications.

       -a archive, --archive=archive
            Performance metric values are retrieved from the set of  Performance  Co-Pilot  (PCP)
            archive  files identified by the archive argument, which is a comma-separated list of
            names, each of which may be the base name of an archive or the name  of  a  directory
            containing one or more archives.

       -A align, --align=align
            Force  the initial sample to be aligned on the boundary of a natural time unit align.
            Refer to PCPIntro(1) for a complete description of the syntax for align.

       --archive-folio=folio
            Read metric source archives  from  the  PCP  archive  folio  created  by  tools  like
            pmchart(1) or, less often, manually with mkaf(1).

       -b scale, --space-scale=scale
            Unit/scale  for  space  (byte)  metrics,  possible  values include bytes, Kbytes, KB,
            Mbytes, MB, and  so  forth.   This  option  will  not  override  possible  per-metric
            specifications.  See also pmParseUnitsStr(3).

       -B scale, --space-scale-force=scale
            Like -b but this option will override per-metric specifications.

       -c config, --config=config
            Specify the config file or directory to use.  In case config is a directory all files
            in  it  ending  .conf  will  be  included.   The  default  is  the  first  found  of:
            ./pcp2spark.conf,      $HOME/.pcp2spark.conf,      $HOME/pcp/pcp2spark.conf,      and
            $PCP_SYSCONF_DIR/pcp2spark.conf.   For   details,   see   the   above   section   and
            pmrep.conf(5).

       --container=container
            Fetch  performance  metrics from the specified container, either local or remote (see
            -h).

       -C, --check
            Exit before reporting any values, but after parsing the configuration and metrics and
            printing possible headers.

       --daemonize
            Daemonize on startup.

       -e derived, --derived=derived
            Specify  derived performance metrics.  If derived starts with a slash (``/'') or with
            a dot (``.'') it will be interpreted as a PCP  derived  metrics  configuration  file,
            otherwise  it  will  be  interpreted  as comma- or semicolon-separated derived metric
            expressions.  For complete description of derived metrics  and  PCP  derived  metrics
            configuration    files    see    pmLoadDerivedConfig(3)   and   pmRegisterDerived(3).
            Alternatively, using  pmrep.conf(5)  configuration  syntax  allows  defining  derived
            metrics as part of metricsets.

            In  case  of  issues  with derived metrics, review the aforementioned manual pages in
            detail and ensure all the required  metrics  are  available,  especially  when  using
            archives.   Use  -Dderive  to  see additional debug information about parsing derived
            metrics.

       -g server, --spark-server=server
            pcp2spark local server address.

       -G, --no-globals
            Do not include global metrics in reporting (see pmrep.conf(5)).

       -h host, --host=host
            Fetch performance metrics  from  pmcd(1)  on  host,  rather  than  from  the  default
            localhost.

       -H, --no-header
            Do not print any headers.

       -i instances, --instances=instances
            Retrieve  and  report only the specified metric instances.  By default all instances,
            present and future, are reported.

            Refer to pmrep(1) for complete description of this option.

       -I, --ignore-incompat
            Ignore incompatible metrics.  By default incompatible metrics (that is, their type is
            unsupported  or they cannot be scaled as requested) will cause pcp2spark to terminate
            with an error message.  With  this  option  all  incompatible  metrics  are  silently
            omitted from reporting.  This may be especially useful when requesting non-leaf nodes
            of the PMNS tree for reporting.

       -j, --live-filter
            Perform instance live filtering.  This allows capturing all named instances  even  if
            processes  are  restarted  at some point (unlike without live filtering).  Performing
            live filtering over a huge number of instances will add some internal overhead  so  a
            bit of user caution is advised.  See also -n.

       -J rank, --rank=rank
            Limit  results  to highest/lowest ranked instances of set-valued metrics.  A positive
            integer will include highest valued instances in reporting.  A negative integer  will
            include  lowest  valued instances in reporting.  A value of zero performs no ranking.
            Ranking does not imply sorting, see -6.  See also -8.

       -K spec, --spec-local=spec
            When fetching metrics from a local context (see -L), the -K option  may  be  used  to
            control  the DSO PMDAs that should be made accessible.  The spec argument conforms to
            the syntax described in pmSpecLocalPMDA(3).  More than one -K option may be used.

       -L, --local-PMDA
            Use a local context to collect metrics from DSO PMDAs on the local host without PMCD.
            See also -K.

       -m, --include-labels
            Include PCP metric labels in the output.

       -n, --invert-filter
            Perform  ranking  before  live  filtering.   By default instance live filtering (when
            requested, see -j) happens before instance ranking (when requested,  see  -J).   With
            this option the logic is inverted and ranking happens before live filtering.

       -N predicate, --predicate=predicate
            Specify  a  comma-separated  list  of predicate filter reference metrics.  By default
            ranking (see -J) happens for each metric individually.  With predicates,  ranking  is
            done  only  for the specified predicate metrics.  When reporting, rest of the metrics
            sharing the same instance domain (see PCPIntro(1)) as the predicate will include only
            the  highest/lowest  ranking  instances of the corresponding predicate.  Ranking does
            not imply sorting, see -6.

            So for example, using proc.memory.rss  (resident  memory  size  of  process)  as  the
            predicate metric together with proc.io.total_bytes and mem.util.used as metrics to be
            reported, only the processes using most/least (as per -J)  memory  will  be  included
            when  reporting  total  bytes written by processes.  Since mem.util.used is a single-
            valued metric (thus not sharing the same  instance  domain  as  the  process  related
            metrics), it will be reported as usual.

       -O origin, --origin=origin
            When  reporting  archived  metrics,  start reporting at origin within the time window
            (see -S and -T).  Refer to PCPIntro(1) for a complete description of the  syntax  for
            origin.

       -p port, --spark-port=port
            pcp2spark local port.

       -P precision, --precision=precision
            Use precision for numeric non-integer output values.  The default is to use 3 decimal
            places  (when  applicable).   This  option  will  not  override  possible  per-metric
            specifications.

       -q scale, --count-scale=scale
            Unit/scale  for  count metrics, possible values include count x 10^-1, count, count x
            10, count x 10^2, and so forth from 10^-8  to  10^7.   (These  values  are  currently
            space-sensitive.)   This option will not override possible per-metric specifications.
            See also pmParseUnitsStr(3).

       -Q scale, --count-scale-force=scale
            Like -q but this option will override per-metric specifications.

       -r, --raw
            Output raw metric values, do not convert cumulative counters to rates.   This  option
            will override possible per-metric specifications.

       -R, --raw-prefer
            Like -r but this option will not override per-metric specifications.

       -s samples, --samples=samples
            The  samples argument defines the number of samples to be retrieved and reported.  If
            samples is 0 or -s is not specified, pcp2spark will sample  and  report  continuously
            (in  real  time  mode) or until the end of the set of PCP archives (in archive mode).
            See also -T.

       -S starttime, --start=starttime
            When reporting archived metrics, the report  will  be  restricted  to  those  records
            logged at or after starttime.  Refer to PCPIntro(1) for a complete description of the
            syntax for starttime.

       -t interval, --interval=interval
            Set the reporting interval to  something  other  than  the  default  1  second.   The
            interval  argument  follows  the syntax described in PCPIntro(1), and in the simplest
            form may be an unsigned integer (the implied units in this case  are  seconds).   See
            also the -T option.

       -T endtime, --finish=endtime
            When  reporting  archived  metrics,  the  report  will be restricted to those records
            logged before or at endtime.  Refer to PCPIntro(1) for a complete description of  the
            syntax for endtime.

            When  used  to  define the runtime before pcp2spark will exit, if no samples is given
            (see -s) then the number of reported  samples  depends  on  interval  (see  -t).   If
            samples  is given then interval will be adjusted to allow reporting of samples during
            runtime.  In case all of -T, -s, and -t are given, endtime determines the actual time
            pcp2spark will run.

       -v, --omit-flat
            Report  only  set-valued metrics with instances (e.g. disk.dev.read) and omit single-
            valued ``flat'' metrics without instances (e.g.  kernel.all.sysfork).  See -i and -I.

       -V, --version
            Display version number and exit.

       -y scale, --time-scale=scale
            Unit/scale for time metrics, possible  values  include  nanosec,  ns,  microsec,  us,
            millisec,  ms,  and  so forth up to hour, hr.  This option will not override possible
            per-metric specifications.  See also pmParseUnitsStr(3).

       -Y scale, --time-scale-force=scale
            Like -y but this option will override per-metric specifications.

       -?, --help
            Display usage message and exit.

FILES

       pcp2spark.conf
            pcp2spark configuration file (see -c)

       $PCP_SYSCONF_DIR/pmrep/*.conf
            system provided default pmrep configuration files

PCP ENVIRONMENT

       Environment variables with the prefix PCP_ are used to parameterize the file and directory
       names used by PCP.  On each installation, the file /etc/pcp.conf contains the local values
       for these variables.  The $PCP_CONF  variable  may  be  used  to  specify  an  alternative
       configuration file, as described in pcp.conf(5).

       For environment variables affecting PCP tools, see pmGetOptions(3).

SEE ALSO

       PCPIntro(1),  mkaf(1),  pcp(1),  pcp2elasticsearch(1),  pcp2graphite(1),  pcp2influxdb(1),
       pcp2json(1),  pcp2xlsx(1),  pcp2xml(1),  pcp2zabbix(1),  pmcd(1),   pminfo(1),   pmrep(1),
       pmGetOptions(3),    pmLoadDerivedConfig(3),    pmParseUnitsStr(3),   pmRegisterDerived(3),
       pmSpecLocalPMDA(3), LOGARCHIVE(5), pcp.conf(5), pmrep.conf(5) and PMNS(5).