lunar (1) pcp2spark.1.gz

Provided by: pcp-export-pcp2spark_6.0.3-1_amd64 bug

NAME

       pcp2spark - pcp-to-spark metrics exporter

SYNOPSIS

       pcp2spark   [-5CGHIjLmnrRvV?]    [-4   action]  [-8|-9  limit]  [-a  archive]  [-A  align]
       [--archive-folio  folio]  [-b|-B  space-scale]   [-c   config]   [--container   container]
       [--daemonize]  [-e  derived]  [-g server] [-h host] [-i instances] [-J rank] [-K spec] [-N
       predicate] [-O origin] [-p port] [-P|-0 precision] [-q|-Q count-scale]  [-s  samples]  [-S
       starttime] [-t interval] [-T endtime] [-y|-Y time-scale] metricspec [...]

DESCRIPTION

       pcp2spark  is  a  customizable performance metrics exporter tool from PCP to Apache Spark.
       Any available performance metric, live or archived,  system  and/or  application,  can  be
       selected for exporting using either command line arguments or a configuration file.

       pcp2spark  acts as a bridge which provides a network socket stream on a given address/port
       which an Apache Spark worker task can connect to and pull the configured PCP metrics  from
       pcp2spark exporting them using the streaming extensions of the Apache Spark API.

       pcp2spark  is  a  close  relative  of  pmrep(1).   Refer  to  pmrep(1)  for the metricspec
       description accepted on pcp2spark command line.  See pmrep.conf(5) for description of  the
       pcp2spark.conf  configuration file syntax.  This page describes pcp2spark specific options
       and configuration file differences with pmrep.conf(5).  pmrep(1)  also  lists  some  usage
       examples of which most are applicable with pcp2spark as well.

       Only  the  command line options listed on this page are supported, other options available
       for pmrep(1) are not supported.

       Options via environment values (see pmGetOptions(3)) override the  corresponding  built-in
       default   values   (if  any).   Configuration  file  options  override  the  corresponding
       environment  variables  (if  any).   Command  line  options  override  the   corresponding
       configuration file options (if any).

GENERAL USAGE

       A  general  setup for making use of pcp2spark would involve the user configuring pcp2spark
       for the PCP metrics  to  export  followed  by  starting  the  pcp2spark  application.  The
       pcp2spark application will then wait and listen on the given address/port for a connection
       from an Apache Spark worker thread to be started.  The worker thread will then connect  to
       pcp2spark.

       When  an  Apache  Spark  worker  thread  has connected, pcp2spark will begin streaming PCP
       metric data to Apache Spark until  the  worker  thread  completes  or  the  connection  is
       interrupted.   If  the  connection  is interrupted or the socket is closed from the Apache
       Spark worker thread pcp2spark will exit.

       For an example Apache Spark worker job which will connect to an pcp2spark  instance  on  a
       given  address/port  and  pull  in  PCP  metric  data  see the example provided in the PCP
       examples directory for pcp2spark (often provided by the PCP development  package)  or  the
       online version at https://github.com/performancecopilot/pcp/blob/main/src/pcp2spark/.

CONFIGURATION FILE

       pcp2spark uses a configuration file with syntax described in pmrep.conf(5).  The following
       options are common with pmrep.conf: version, source, speclocal, derived, header,  globals,
       samples,   interval,   type,   type_prefer,   ignore_incompat,   names_change,  instances,
       live_filter, rank, limit_filter, limit_filter_force, invert_filter, predicate,  omit_flat,
       include_labels,  precision,  precision_force, count_scale, count_scale_force, space_scale,
       space_scale_force, time_scale, time_scale_force.  The rest of the pmrep.conf  options  are
       recognized but ignored for compatibility.

   pcp2spark specific options
       spark_server (string)
           Specify  the  address  on  which  pcp2spark will listen for connections from an Apache
           Spark worker thread.  Corresponding command line option is -g.  Defaults to 127.0.0.1.

       spark_port (integer)
           Specify the port on  which  pcp2spark  will  listen  for  connections.   Corresponding
           command line option is -p.  Defaults to 44325.

OPTIONS

       The available command line options are:

       -0 precision, --precision-force=precision
            Like -P but this option will override per-metric specifications.

       -4 action, --names-change=action
            Specify  which  action  to  take  on  receiving  a  metric  names change event during
            sampling.  These events occur when  a  PMDA  discovers  new  metrics  sometime  after
            starting  up,  and  informs  running  client  tools like pcp2spark.  Valid values for
            action are update (refresh metrics being sampled), ignore (do nothing -  the  default
            behaviour) and abort (exit the program if such an event occurs).

       -5, --ignore-unknown
            Silently ignore any metric name that cannot be resolved.  At least one metric must be
            found for the tool to start.

       -8 limit, --limit-filter=limit
            Limit results to instances with values above/below limit.  A  positive  integer  will
            include instances with values at or above the limit in reporting.  A negative integer
            will include instances with values at or below the limit in reporting.   A  value  of
            zero  performs no limit filtering.  This option will not override possible per-metric
            specifications.  See also -J and -N.

       -9 limit, --limit-filter-force=limit
            Like -8 but this option will override per-metric specifications.

       -a archive, --archive=archive
            Performance metric values are retrieved from the set of  Performance  Co-Pilot  (PCP)
            archive log files identified by the archive argument, which is a comma-separated list
            of names, each of which may be the base name of an archive or the name of a directory
            containing one or more archives.

       -A align, --align=align
            Force  the initial sample to be aligned on the boundary of a natural time unit align.
            Refer to PCPIntro(1) for a complete description of the syntax for align.

       --archive-folio=folio
            Read metric source archives  from  the  PCP  archive  folio  created  by  tools  like
            pmchart(1) or, less often, manually with mkaf(1).

       -b scale, --space-scale=scale
            Unit/scale  for  space  (byte)  metrics,  possible  values include bytes, Kbytes, KB,
            Mbytes, MB, and  so  forth.   This  option  will  not  override  possible  per-metric
            specifications.  See also pmParseUnitsStr(3).

       -B scale, --space-scale-force=scale
            Like -b but this option will override per-metric specifications.

       -c config, --config=config
            Specify the config file or directory to use.  In case config is a directory all files
            in  it  ending  .conf  will  be  included.   The  default  is  the  first  found  of:
            ./pcp2spark.conf,      $HOME/.pcp2spark.conf,      $HOME/pcp/pcp2spark.conf,      and
            $PCP_SYSCONF_DIR/pcp2spark.conf.   For   details,   see   the   above   section   and
            pmrep.conf(5).

       --container=container
            Fetch  performance  metrics from the specified container, either local or remote (see
            -h).

       -C, --check
            Exit before reporting any values, but after parsing the configuration and metrics and
            printing possible headers.

       --daemonize
            Daemonize on startup.

       -e derived, --derived=derived
            Specify  derived performance metrics.  If derived starts with a slash (``/'') or with
            a dot (``.'') it will be interpreted as a PCP  derived  metrics  configuration  file,
            otherwise  it  will  be  interpreted  as comma- or semicolon-separated derived metric
            expressions.  For complete description of derived metrics  and  PCP  derived  metrics
            configuration    files    see    pmLoadDerivedConfig(3)   and   pmRegisterDerived(3).
            Alternatively, using  pmrep.conf(5)  configuration  syntax  allows  defining  derived
            metrics as part of metricsets.

       -g server, --spark-server=server
            pcp2spark local server address.

       -G, --no-globals
            Do not include global metrics in reporting (see pmrep.conf(5)).

       -h host, --host=host
            Fetch  performance  metrics  from  pmcd(1)  on  host,  rather  than  from the default
            localhost.

       -H, --no-header
            Do not print any headers.

       -i instances, --instances=instances
            Retrieve and report only the specified metric instances.  By default  all  instances,
            present and future, are reported.

            Refer to pmrep(1) for complete description of this option.

       -I, --ignore-incompat
            Ignore incompatible metrics.  By default incompatible metrics (that is, their type is
            unsupported or they cannot be scaled as requested) will cause pcp2spark to  terminate
            with  an  error  message.   With  this  option  all incompatible metrics are silently
            omitted from reporting.  This may be especially useful when requesting non-leaf nodes
            of the PMNS tree for reporting.

       -j, --live-filter
            Perform  instance  live filtering.  This allows capturing all named instances even if
            processes are restarted at some point (unlike without  live  filtering).   Performing
            live  filtering  over a huge number of instances will add some internal overhead so a
            bit of user caution is advised.  See also -n.

       -J rank, --rank=rank
            Limit results to highest/lowest ranked instances of set-valued metrics.   A  positive
            integer  will include highest valued instances in reporting.  A negative integer will
            include lowest valued instances in reporting.  A value of zero performs  no  ranking.
            Ranking does not imply sorting, see -6.  See also -8.

       -K spec, --spec-local=spec
            When  fetching  metrics  from  a local context (see -L), the -K option may be used to
            control the DSO PMDAs that should be made accessible.  The spec argument conforms  to
            the syntax described in pmSpecLocalPMDA(3).  More than one -K option may be used.

       -L, --local-PMDA
            Use a local context to collect metrics from DSO PMDAs on the local host without PMCD.
            See also -K.

       -m, --include-labels
            Include PCP metric labels in the output.

       -n, --invert-filter
            Perform ranking before live filtering.  By  default  instance  live  filtering  (when
            requested,  see  -j)  happens before instance ranking (when requested, see -J).  With
            this option the logic is inverted and ranking happens before live filtering.

       -N predicate, --predicate=predicate
            Specify a comma-separated list of predicate filter  reference  metrics.   By  default
            ranking  (see  -J) happens for each metric individually.  With predicates, ranking is
            done only for the specified predicate metrics.  When reporting, rest of  the  metrics
            sharing the same instance domain (see PCPIntro(1)) as the predicate will include only
            the highest/lowest ranking instances of the corresponding  predicate.   Ranking  does
            not imply sorting, see -6.

            So  for  example,  using  proc.memory.rss  (resident  memory  size of process) as the
            predicate metric together with proc.io.total_bytes and mem.util.used as metrics to be
            reported,  only  the  processes  using most/least (as per -J) memory will be included
            when reporting total bytes written by processes.  Since mem.util.used  is  a  single-
            valued  metric  (thus  not  sharing  the  same instance domain as the process related
            metrics), it will be reported as usual.

       -O origin, --origin=origin
            When reporting archived metrics, start reporting at origin  within  the  time  window
            (see  -S  and -T).  Refer to PCPIntro(1) for a complete description of the syntax for
            origin.

       -p port, --spark-port=port
            pcp2spark local port.

       -P precision, --precision=precision
            Use precision for numeric non-integer output values.  The default is to use 3 decimal
            places  (when  applicable).   This  option  will  not  override  possible  per-metric
            specifications.

       -q scale, --count-scale=scale
            Unit/scale for count metrics, possible values include count x 10^-1, count,  count  x
            10,  count  x  10^2,  and  so  forth from 10^-8 to 10^7.  (These values are currently
            space-sensitive.)  This option will not override possible per-metric  specifications.
            See also pmParseUnitsStr(3).

       -Q scale, --count-scale-force=scale
            Like -q but this option will override per-metric specifications.

       -r, --raw
            Output  raw  metric values, do not convert cumulative counters to rates.  This option
            will override possible per-metric specifications.

       -R, --raw-prefer
            Like -r but this option will not override per-metric specifications.

       -s samples, --samples=samples
            The samples argument defines the number of samples to be retrieved and reported.   If
            samples  is  0  or -s is not specified, pcp2spark will sample and report continuously
            (in real time mode) or until the end of the set of PCP archives  (in  archive  mode).
            See also -T.

       -S starttime, --start=starttime
            When  reporting  archived  metrics,  the  report  will be restricted to those records
            logged at or after starttime.  Refer to PCPIntro(1) for a complete description of the
            syntax for starttime.

       -t interval, --interval=interval
            Set  the  reporting  interval  to  something  other  than  the default 1 second.  The
            interval argument follows the syntax described in PCPIntro(1), and  in  the  simplest
            form  may  be  an unsigned integer (the implied units in this case are seconds).  See
            also the -T option.

       -T endtime, --finish=endtime
            When reporting archived metrics, the report  will  be  restricted  to  those  records
            logged  before or at endtime.  Refer to PCPIntro(1) for a complete description of the
            syntax for endtime.

            When used to define the runtime before pcp2spark will exit, if no  samples  is  given
            (see  -s)  then  the  number  of  reported  samples depends on interval (see -t).  If
            samples is given then interval will be adjusted to allow reporting of samples  during
            runtime.  In case all of -T, -s, and -t are given, endtime determines the actual time
            pcp2spark will run.

       -v, --omit-flat
            Report only set-valued metrics with instances (e.g. disk.dev.read) and  omit  single-
            valued ``flat'' metrics without instances (e.g.  kernel.all.sysfork).  See -i and -I.

       -V, --version
            Display version number and exit.

       -y scale, --time-scale=scale
            Unit/scale  for  time  metrics,  possible  values  include nanosec, ns, microsec, us,
            millisec, ms, and so forth up to hour, hr.  This option will  not  override  possible
            per-metric specifications.  See also pmParseUnitsStr(3).

       -Y scale, --time-scale-force=scale
            Like -y but this option will override per-metric specifications.

       -?, --help
            Display usage message and exit.

FILES

       pcp2spark.conf
            pcp2spark configuration file (see -c)

       $PCP_SYSCONF_DIR/pmrep/*.conf
            system provided default pmrep configuration files

PCP ENVIRONMENT

       Environment variables with the prefix PCP_ are used to parameterize the file and directory
       names used by PCP.  On each installation, the file /etc/pcp.conf contains the local values
       for  these  variables.   The  $PCP_CONF  variable  may  be  used to specify an alternative
       configuration file, as described in pcp.conf(5).

       For environment variables affecting PCP tools, see pmGetOptions(3).

SEE ALSO

       PCPIntro(1),  mkaf(1),  pcp(1),  pcp2elasticsearch(1),  pcp2graphite(1),  pcp2influxdb(1),
       pcp2json(1),   pcp2xlsx(1),   pcp2xml(1),  pcp2zabbix(1),  pmcd(1),  pminfo(1),  pmrep(1),
       pmGetOptions(3),   pmLoadDerivedConfig(3),    pmParseUnitsStr(3),    pmRegisterDerived(3),
       pmSpecLocalPMDA(3), LOGARCHIVE(5), pcp.conf(5), pmrep.conf(5) and PMNS(5).