Provided by: pcp-import-sheet2pcp_5.0.3-1_all bug

NAME

       sheet2pcp - import spreadsheet data and create a PCP archive

SYNOPSIS

       sheet2pcp [-h host] [-Z timezone] infile mapfile outfile

DESCRIPTION

       sheet2pcp  is  intended  to  read  a data spreadsheet (infile) translate this into a Performance Co-Pilot
       (PCP) archive with the basename outfile.

       The input spreadsheet can be in any of the common formats, provided the  appropriate  Perl  modules  have
       been  installed (see the CAVEATS section below).  The spreadsheet must be ``normalized'' so that each row
       contains data for the same time interval, and one of the columns contains the date and time for the  data
       in each row.

       The  resultant  PCP  archive may be used with all the PCP client tools to graph subsets of the data using
       pmchart(1), perform data reduction and reporting, filter with the PCP inference engine pmie(1), etc.

       The mapfile controls the import process and defines the data mapping from the  spreadsheet  columns  onto
       the  PCP  data  model.   The  file  is  written  in XML and conforms to the syntax defined in the MAPPING
       CONFIGURATION section below.

       A series of physical files will be created with the prefix outfile.  These are outfile.0 (the performance
       data),  outfile.meta  (the  metadata  that  describes the performance data) and outfile.index (a temporal
       index to improve efficiency of replay operations for the archive).  If any of these files exists already,
       then sheet2pcp will not overwrite them and will exit with an error message.

       The  -h  option  is  an  alternate  to the hostname attribute of the <sheet> element in mapfile described
       below.  If both are specified, the value from mapfile is used.

       The -Z option is an alternate to the timezone attribute of  the  <sheet>  element  in  mapfile  described
       below.  If both are specified, the value from mapfile is used.

       sheet2pcp  is  a  Perl  script  that  uses  the  PCP::LogImport Perl wrapper around the PCP libpcp_import
       library, and as such could be used as  an  example  to  develop  new  tools  to  import  other  types  of
       performance data and create PCP archives.

MAPPING CONFIGURATION

       The mapfile contains specifications in standard XML format.

       The  whole  specification  is  wrapped  in  a  <sheet> ... </sheet> element.  The  sheet tag supports the
       following optional attributes:

       heading   Specifies the number of heading rows to skip at the start of the spreadsheet before  processing
                 data.  Example: heading=``1''.

       hostname  Set  the  source  hostname  in the PCP archive (the default is to use the hostname of the local
                 host).  Example: hostname=``some.where.com''.

       timezone  Set the source timezone in the PCP archive (the default is to use UTC).  The timezone must have
                 the  format  +HHMM  (for hours and minutes East of UTC) or -HHMM (for hours and minutes West of
                 UTC).  Note in particular that neither the zoneinfo (aka Olson) format, e.g. Europe/Paris,  nor
                 the Posix TZ format, e.g.  EST+5 is allowed.  Example: timezone=``+1100''.

       datefmt   The  format  of  the  date imported from the spreadsheet may be specified as a concatenation of
                 values that specify the order of the year (Y), month (M) and day (D) fields  in  a  date.   The
                 supported variants are DMY (the default), MDY and YMD.  Example: datefmt=``YMD''.

       A  <sheet>  element  contains  one or more metric specifications of the form <metric>metricname</metric>.
       The metric tag supports the following optional attributes:

       pmid      The Performance Metrics Identifier (PMID), specified as 3 numbers separated by a periods (.) to
                 set the domain, cluster and item fields of the PMID, see PMNS(5) for more details of PMIDs.  If
                 omitted, the PMID will be automatically assigned by pmiAddMetric(3).  The value PM_ID_NULL  may
                 be   used   to   explicitly   nominate   the  default  behaviour.   Examples:  pmid=``60.0.2'',
                 pmid=``PM_ID_NULL''.

       indom     Each metric may have one or more values.  If a metric always has one value, it is singular  and
                 the  Instance  Domain should be set to PM_INDOM_NULL.  Otherwise indom should be specified as 2
                 numbers separated by a period (.)  to set the domain and ordinal fields of the Instance Domain.
                 Examples: indom=``PM_INDOM_NULL'', indom=``60.3'', indom=``PMI_DOMAIN.4''.

                 More  than  one  metric can share the same Instance Domain when the metrics have defined values
                 over similar sets of instances, e.g. all  the  metrics  for  each  network  interface.   It  is
                 standard  practice  for the domain field to be the same for the pmid and the indom; if the pmid
                 attribute is missing, then the domain field  for  the  indom  should  be  the  reserved  domain
                 PMI_DOMAIN.

                 If  the  indom  attribute  is  omitted  then  the  default  Instance  Domain  for the metric is
                 PM_INDOM_NULL.

       units     The scale and dimension of the metric values along the axes of space, time and  count  (events,
                 messages,  packets,  etc.)  is  specified  with  a  6-tuple.   These  values  are passed to the
                 pmiUnits(3) function to generate a pmUnits structure.  Refer  to  pmLookupDesc(3)  for  a  full
                 description  of  all  the  fields  of  this  structure.   The  default is to assign no scale or
                 dimension      to      the      metric,      i.e.       units=``0,0,0,0,0,0''.        Examples:
                 units=``0,1,0,0,PM_TIME_MSEC,0''  (milliseconds), units=``1,-1,0,PM_SPACE_MBYTE,PM_TIME_SEC,0''
                 (Mbytes/sec), units=``0,1,-1,0,PM_TIME_USEC,PM_COUNT_ONE'' (microseconds/event).

       type      Defines the data type for the metric.  Refer to pmLookupDesc(3) for a full description  of  the
                 possible   type   values;   the   default  is  PM_TYPE_FLOAT.   Examples:  type=``PM_TYPE_32'',
                 type=``PM_TYPE_U64'', type=``PM_TYPE_STRING''.

       sem       Defines the semantics of the metric.  Refer to pmLookupDesc(3) for a full  description  of  the
                 possible   values;   the   default   is   PM_SEM_INSTANT.    Examples:  sem=``PM_SEM_COUNTER'',
                 type=``PM_SEM_DISCRETE''.

       The remaining specifications define the data columns in order  using  exactly  one  <datetime></datetime>
       element, one or more <data>metricspec</data> elements and one or more <skip></skip> elements.

       The <datetime> element defines the column in which a date and time will be found to form the timestamp in
       the PCP archive for all the data in each row of the PCP archive.

       For the <data> element, a metricspec consists of a  metric  name  (as  defined  in  an  earlier  <metric>
       element),   optionally  followed  by  an  instance  name  that  is  enclosed  by  square  brackets,  e.g.
       <data>hinv.ncpu</data>, <data>kernel.all.load[1 minute]</data>.

       The skip tag defines the column that should be skipped when preparing data for the PCP archive.

       The order of the <datetime>, <data> and <skip> elements matches the order of columns in the  spreadsheet.
       If  the  number  of  elements is not the same as the number of columns a warning is issued, and the extra
       elements or columns generate no metric values in the output archive.

   EXAMPLE
       The mapfile ...

             <?xml version="1.0" encoding="UTF-8"?>
             <sheet heading="1">
                 <!-- simple example -->
                 <metric pmid="60.0.2" indom="60.0" units="0,1,0,0,PM_TIME_MSEC,0"
                     type="PM_TYPE_U64" sem="PM_SEM_COUNTER">
                 kernel.percpu.cpu.sys</metric>
                 <datetime></datetime>
                 <skip></skip>
                 <data>kernel.percpu.cpu.sys[cpu0]</data>
                 <data>kernel.percpu.cpu.sys[cpu1]</data>
             </sheet>

       could be used for a spreadsheet in which the first few rows are ...

             Date;"Status";"SysTime - 0";"SysTime - 1";
             26/01/2001 14:05:22;"Some Busy";0.750;0.133
             26/01/2001 14:05:37;"OK";0.150;0.273
             26/01/2001 14:05:52;"All Busy";0.733;0.653

CAVEATS

       Only the first sheet from infile will be processed.

       Additional Perl modules must be installed for the various spreadsheet formats, although these are checked
       for  ar  run-time so only the modules required for the specific types of spreadsheets you wish to process
       need be installed:

       *.csv Spreadsheets in the Comma Separated Values (CSV) format require Text::CSV_XS(3pm).

       *.sxc or *.ods
             OpenOffice documents require Spreadsheet::ReadSXC(3pm), which in turn requires Archive::Zip(3pm).

       *.xls Classical Microsoft Office documents require Spreadsheet::ParseExcel(3pm), which in  turn  requires
             OLE::Storage_Lite(3pm).

       *.xlsx
             Microsoft OpenXML documents require Spreadsheet::XLSX(3pm).  sheet2pcp does not appear to work with
             OpenXML documents saved from OpenOffice.

PCP ENVIRONMENT

       Environment variables with the prefix PCP_ are used to parameterize the file and directory names used  by
       PCP.   On  each  installation, the file /etc/pcp.conf contains the local values for these variables.  The
       $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf(5).

       For environment variables affecting PCP tools, see pmGetOptions(3).

SEE ALSO

       pmchart(1),   pmie(1),    pmlogger(1),    sed(1),    pmiAddMetric(3),    pmLookupDesc(3),    pmiUnits(3),
       Archive::Zip(3pm),   Date::Format(3pm),  Date::Parse(3pm),  PCP::LogImport(3pm),  OLE::Storage_Lite(3pm),
       Spreadsheet::ParseExcel(3pm),   Spreadsheet::ReadSXC(3pm),   Spreadsheet::XLSX(3pm),   Text::CSV_XS(3pm),
       XML::TokeParser(3pm) and LOGIMPORT(3).