Provided by: likwid_3.1.3+dfsg1-1_amd64 bug

NAME

       likwid-perfctr - configure and read out hardware performance counters on x86 cpus

SYNOPSIS

       likwid-perfctr  [-vhHVmaeiMoO] [-c/-C <core_list>] [-g <performance_group> or <performance_event_string>]
       [-t <frequency>] [-S <time>] [-s <skip_mask>] [-o <output_file>]

DESCRIPTION

       likwid-perfctr is a lightweight command line application to configure and read out  hardware  performance
       monitoring  data  on  supported  x86  processors.  It  can measure either as wrapper without changing the
       measured application or with marker API functions inside the  code,  which  will  turn  on  and  off  the
       counters.  There  are  preconfigured  groups  with  useful  event  sets  and derived metrics. Additonally
       arbitrary events can be measured with custom event sets.  The  marker  API  can  measure  mulitple  named
       regions. Results are accumulated on multiple calls.  The following x86 processors are supported:

       •      Intel Core 2: all variants. Counters: PMC[0-1], FIXC[0-2]Intel Nehalem: Counters: PMC[0-3], FIXC[0-2], UPMC[0-7]Intel    Nehalem    EX:   Counters:   PMC[0-3],   FIXC[0-2],   MBOX[0-1]C[0-5],   BBOX[0-1]C[0-3],
              RBOX[0-1]C[0-7], WBOX[0-5], UBOX0, SBOX[0-1]C[0-3], CBOX[0-9]C[0-4]Intel Westmere:
               Counters: PMC[0-3], FIXC[0-2], UPMC[0-7]Intel   Westmere   EX:   Counters:   PMC[0-3],   FIXC[0-2],   MBOX[0-1]C[0-5],    BBOX[0-1]C[0-3],
              RBOX[0-1]C[0-7], WBOX[0-5], UBOX0, SBOX[0-1]C[0-3], CBOX[0-9]C[0-4]Intel Sandy Bridge: full RAPL support. Counters: PMC[0-3], FIXC[0-2], PWR[0-3]Intel  Sandy  Bridge  EP:  partial  support  for  uncore,  full  RAPL support. Counters: PMC[0-3],
              FIXC[0-2], PWR[0-3]. MBOX[0-3]C[0-3]Intel Ivy Bridge: full RAPL support. Counters: PMC[0-3], FIXC[0-2], PWR[0-3]Intel Ivy Bridge EP: partial support for uncore, full RAPL support. Counters: PMC[0-3], FIXC[0-2],
              PWR[0-3], CBOX[0-9]C[0-3], MBOX[0-3]C[0-3], MBOX[0-3]FIXIntel Haswell: full RAPL support. Counters: PMC[0-3], FIXC[0-2], PWR[0-3]Intel Haswell EP: no uncore support, full RAPL support. Counters: PMC[0-3], FIXC[0-2], PWR[0-3]Intel Atom Silvermont: full RAPL support. Counters: PMC[0-1], FIXC[0-2], PWR[0-1]Intel Pentium M: Banias and Dothan variants. Counters: PMC[0-1]Intel P6: Tested on P3.

       •      AMD K8: all variants. Counters: PMC[0-3]AMD K10: Barcelona, Shanghai, Istanbul, MagnyCours based processors. Counters: PMC[0-3]

OPTIONS

       -v     prints version information to standard output, then exits.

       -h     prints a help message to standard output, then exits.

       -H     prints group help message (use together with -g switch).

       -V     verbose output during execution for debugging.

       -m     run in marker API mode

       -a     print available performance groups for current processor.

       -e     print available counters and performance events of current processor.

       -o  <filename>
              store all ouput to a file instead of stdout. For  the  filename  the  following  placeholders  are
              supported:  %j  for  PBS_JOBID, %r for MPI RANK (only Intel MPI at the moment), %h hostname and %p
              for process pid.  The placeholders must be separated by underscore as, e.g.,  -o  test_%h_%p.  You
              must  specify  a  suffix  to  the filename. For txt the output is printed as is to the file. Other
              suffixes trigger a filter on the output.  Available filters are csv (comma separated  values)  and
              xml at the moment.

       -O     Do not print tables for results, use easily parseable CSV instead.

       -i     print cpuid information about processor and on Intel Performance Monitoring features, then exit.

       -c  <processor_list>
              specify  a  numerical list of processors. The list may contain multiple items, separated by comma,
              and ranges. For example 0,3,9-11.

       -C  <processor_list>
              specify a numerical list of processors. The list may contain multiple items, separated  by  comma,
              and  ranges.  For  example  0,3,9-11.  This  variant  will also pin the threads to the cores. Also
              logical numberings can be used.

       -g  <performance group> or <performance event set string>
              specify which performance group to measure. This can be one of the tags output with the  -a  flag.
              Also  a  custom event set can be specified by a comma separated list of events. Each event has the
              format eventId:register with the the register being one of a  architecture  supported  performance
              counter registers.

       -t  <frequency of measurements>
              timeline  mode  for  time  resolved  measurements,  possible suffixes 's' and 'ms' like 100ms. The
              output has the format:

       <Event> <Timestamp> <Result thread0> <Result thread1> ...

       -S  <time_in_seconds>
              stethoscope mode with duration in senconds. Can  be  used  to  measure  an  application  from  the
              outside.

EXAMPLE

       Because  likwid-perfctr measures on processors and not single applications it is necessary to ensure that
       processes and threads are pinned to dedicated resources. You can either pin the application  yourself  or
       use the builtin pin functionality.

       1.  As wrapper with performance group:

       likwid-perfctr -C 0-2 -g TLB ./cacheBench -n 2 -l 1048576 -i 100 -t Stream

       The parent process is pinned to processor 0, Thread 0 to processor 1 and Thread 1 to processor 2.

       2.  As wrapper with custom event set on AMD:

       likwid-perfctr -C 0-4 -g INSTRUCTIONS_RETIRED_SSE:PMC0,CPU_CLOCKS_UNHALTED:PMC3 ./myApp

       It  is  specified  that  the  event  INSTRUCTIONS_RETIRED_SSE  is  measured on counter PMC0 and the event
       CPU_CLOCKS_UNHALTED on counter PMC3.  It is possible calculate the runtime of all threads  based  on  the
       CPU_CLOCKS_UNHALTED event. If you want this you have to include this event in your custom event string as
       shown above.

       3.  As wrapper with custom event set on Intel:

       likwid-perfctr -C 0 -g INSTR_RETIRED_ANY:FIXC0,CPU_CLK_UNHALTED_CORE:FIXC1 ./myApp

       On  Intel  processors  fixed  events  are  measured  on dedicated counters. These are INSTR_RETIRED_ANY ,
       CPU_CLK_UNHALTED_CORE.  and CPU_CLK_UNHALTED_REF If you configure these  fixed  counters,  likwid-perfctr
       will calculate the runtime and CPI metrics for your run.

       4.  Using  the marker API to measure only parts of your code (this can be used both with groups or custom
           event sets):

       likwid-perfctr -m -C 0-4 -g INSTRUCTIONS_RETIRED_SSE:PMC0,CPU_CLOCKS_UNHALTED:PMC3 ./cacheBench

       You have to link you code against liblikwid.a/.so and use the  marker  API  calls.   The  following  code
       snippet shows the necessary calls:

       #include <likwid.h>

       /* only one thread calls init */
       if (threadId == 0)
       {
           likwid_markerInit();
       }
       /* if you want to measure an threaded application
        * you have to call likwid_markerThreadInit() for
        * preparation, example with OpenMP */
       #pragma omp parallel
       {
            likwid_markerThreadInit();
       }
       BARRIER;
       likwid_markerStartRegion("Benchmark");
       /* your code to be measured is here.*/

       likwid_markerStopRegion("Benchmark");
       BARRIER;
       /* again only one thread can close the markers */
       if (threadId == 0)
       {
           likwid_markerClose();
       }

       5.  Using likwid in timeline mode:

       likwid-perfctr -c 0-3 -g FLOPS_DP -t 300ms ./myApp > out.txt

       This  will read out the counters every 300ms on physical cores 0-3 and write the results to out.txt.  For
       timeline mode there is a frontend application likwid-scope,  which  enables  live  plotting  of  selected
       events.  For more code examples have a look at the likwid WIKI pages. The processes are not pinned to the
       CPUs 0-3.

       6.  Using likwid in stethoscope mode:

       likwid-perfctr -c 0-3 -g FLOPS_DP -S 2s

       This  will  start  the counters and read them out after 2s on physical cores 0-3 and write the results to
       stdout. The processes are not pinned to the CPUs 0-3.

AUTHOR

       Written by Jan Treibig <jan.treibig@gmail.com>.

BUGS

       Report Bugs on <http://code.google.com/p/likwid/issues/list>.

SEE ALSO

       likwid-topology(1), likwid-features(1), likwid-pin(1), likwid-bench(1)

likwid-3                                            12.2.2014                                  LIKWID-PERFCTR(1)