Provided by: likwid_3.1.3+dfsg1-1_amd64 bug

NAME

       likwid-perfctr - configure and read out hardware performance counters on x86 cpus

SYNOPSIS

       likwid-perfctr    [-vhHVmaeiMoO]    [-c/-C   <core_list>]   [-g   <performance_group>   or
       <performance_event_string>]  [-t   <frequency>]   [-S   <time>]   [-s   <skip_mask>]   [-o
       <output_file>]

DESCRIPTION

       likwid-perfctr  is  a  lightweight  command  line  application  to  configure and read out
       hardware performance monitoring data on supported x86 processors. It can measure either as
       wrapper  without changing the measured application or with marker API functions inside the
       code, which will turn on and off the counters. There are preconfigured groups with  useful
       event  sets  and derived metrics. Additonally arbitrary events can be measured with custom
       event sets. The marker API can measure mulitple named regions. Results are accumulated  on
       multiple calls.  The following x86 processors are supported:

       •      Intel Core 2: all variants. Counters: PMC[0-1], FIXC[0-2]Intel Nehalem: Counters: PMC[0-3], FIXC[0-2], UPMC[0-7]Intel  Nehalem EX: Counters: PMC[0-3], FIXC[0-2], MBOX[0-1]C[0-5], BBOX[0-1]C[0-3],
              RBOX[0-1]C[0-7], WBOX[0-5], UBOX0, SBOX[0-1]C[0-3], CBOX[0-9]C[0-4]Intel Westmere:
               Counters: PMC[0-3], FIXC[0-2], UPMC[0-7]Intel Westmere EX: Counters: PMC[0-3], FIXC[0-2], MBOX[0-1]C[0-5], BBOX[0-1]C[0-3],
              RBOX[0-1]C[0-7], WBOX[0-5], UBOX0, SBOX[0-1]C[0-3], CBOX[0-9]C[0-4]Intel Sandy Bridge: full RAPL support. Counters: PMC[0-3], FIXC[0-2], PWR[0-3]Intel  Sandy  Bridge  EP:  partial support for uncore, full RAPL support. Counters:
              PMC[0-3], FIXC[0-2], PWR[0-3]. MBOX[0-3]C[0-3]Intel Ivy Bridge: full RAPL support. Counters: PMC[0-3], FIXC[0-2], PWR[0-3]Intel Ivy Bridge EP: partial support  for  uncore,  full  RAPL  support.  Counters:
              PMC[0-3], FIXC[0-2], PWR[0-3], CBOX[0-9]C[0-3], MBOX[0-3]C[0-3], MBOX[0-3]FIXIntel Haswell: full RAPL support. Counters: PMC[0-3], FIXC[0-2], PWR[0-3]Intel  Haswell  EP:  no  uncore  support,  full  RAPL  support. Counters: PMC[0-3],
              FIXC[0-2], PWR[0-3]Intel Atom Silvermont: full RAPL support. Counters: PMC[0-1], FIXC[0-2], PWR[0-1]Intel Pentium M: Banias and Dothan variants. Counters: PMC[0-1]Intel P6: Tested on P3.

       •      AMD K8: all variants. Counters: PMC[0-3]AMD K10: Barcelona, Shanghai,  Istanbul,  MagnyCours  based  processors.  Counters:
              PMC[0-3]

OPTIONS

       -v     prints version information to standard output, then exits.

       -h     prints a help message to standard output, then exits.

       -H     prints group help message (use together with -g switch).

       -V     verbose output during execution for debugging.

       -m     run in marker API mode

       -a     print available performance groups for current processor.

       -e     print available counters and performance events of current processor.

       -o  <filename>
              store  all  ouput  to  a  file  instead  of  stdout. For the filename the following
              placeholders are supported: %j for PBS_JOBID, %r for MPI RANK (only  Intel  MPI  at
              the  moment),  %h  hostname  and  %p  for  process  pid.   The placeholders must be
              separated by underscore as, e.g., -o test_%h_%p. You must specify a suffix  to  the
              filename. For txt the output is printed as is to the file. Other suffixes trigger a
              filter on the output.  Available filters are csv (comma separated values)  and  xml
              at the moment.

       -O     Do not print tables for results, use easily parseable CSV instead.

       -i     print  cpuid  information  about  processor  and  on  Intel  Performance Monitoring
              features, then exit.

       -c  <processor_list>
              specify a numerical list of  processors.  The  list  may  contain  multiple  items,
              separated by comma, and ranges. For example 0,3,9-11.

       -C  <processor_list>
              specify  a  numerical  list  of  processors.  The  list may contain multiple items,
              separated by comma, and ranges. For example 0,3,9-11. This variant  will  also  pin
              the threads to the cores. Also logical numberings can be used.

       -g  <performance group> or <performance event set string>
              specify which performance group to measure. This can be one of the tags output with
              the -a flag.  Also a custom event set can be specified by a comma separated list of
              events.  Each event has the format eventId:register with the the register being one
              of a architecture supported performance counter registers.

       -t  <frequency of measurements>
              timeline mode for time resolved measurements, possible suffixes 's' and  'ms'  like
              100ms. The output has the format:

       <Event> <Timestamp> <Result thread0> <Result thread1> ...

       -S  <time_in_seconds>
              stethoscope  mode  with duration in senconds. Can be used to measure an application
              from the outside.

EXAMPLE

       Because likwid-perfctr measures on processors and not single applications it is  necessary
       to ensure that processes and threads are pinned to dedicated resources. You can either pin
       the application yourself or use the builtin pin functionality.

       1.  As wrapper with performance group:

       likwid-perfctr -C 0-2 -g TLB ./cacheBench -n 2 -l 1048576 -i 100 -t Stream

       The parent process is pinned to processor 0, Thread 0 to  processor  1  and  Thread  1  to
       processor 2.

       2.  As wrapper with custom event set on AMD:

       likwid-perfctr -C 0-4 -g INSTRUCTIONS_RETIRED_SSE:PMC0,CPU_CLOCKS_UNHALTED:PMC3 ./myApp

       It  is  specified  that the event INSTRUCTIONS_RETIRED_SSE is measured on counter PMC0 and
       the event CPU_CLOCKS_UNHALTED on counter PMC3.  It is possible calculate  the  runtime  of
       all  threads  based on the CPU_CLOCKS_UNHALTED event. If you want this you have to include
       this event in your custom event string as shown above.

       3.  As wrapper with custom event set on Intel:

       likwid-perfctr -C 0 -g INSTR_RETIRED_ANY:FIXC0,CPU_CLK_UNHALTED_CORE:FIXC1 ./myApp

       On  Intel  processors  fixed  events  are  measured  on  dedicated  counters.  These   are
       INSTR_RETIRED_ANY  ,  CPU_CLK_UNHALTED_CORE.   and  CPU_CLK_UNHALTED_REF  If you configure
       these fixed counters, likwid-perfctr will calculate the runtime and CPI metrics  for  your
       run.

       4.  Using  the  marker  API to measure only parts of your code (this can be used both with
           groups or custom event sets):

       likwid-perfctr  -m  -C   0-4   -g   INSTRUCTIONS_RETIRED_SSE:PMC0,CPU_CLOCKS_UNHALTED:PMC3
       ./cacheBench

       You  have  to  link  you  code  against liblikwid.a/.so and use the marker API calls.  The
       following code snippet shows the necessary calls:

       #include <likwid.h>

       /* only one thread calls init */
       if (threadId == 0)
       {
           likwid_markerInit();
       }
       /* if you want to measure an threaded application
        * you have to call likwid_markerThreadInit() for
        * preparation, example with OpenMP */
       #pragma omp parallel
       {
            likwid_markerThreadInit();
       }
       BARRIER;
       likwid_markerStartRegion("Benchmark");
       /* your code to be measured is here.*/

       likwid_markerStopRegion("Benchmark");
       BARRIER;
       /* again only one thread can close the markers */
       if (threadId == 0)
       {
           likwid_markerClose();
       }

       5.  Using likwid in timeline mode:

       likwid-perfctr -c 0-3 -g FLOPS_DP -t 300ms ./myApp > out.txt

       This will read out the counters every 300ms on physical cores 0-3 and write the results to
       out.txt.   For  timeline  mode there is a frontend application likwid-scope, which enables
       live plotting of selected events.  For more code examples have a look at the  likwid  WIKI
       pages. The processes are not pinned to the CPUs 0-3.

       6.  Using likwid in stethoscope mode:

       likwid-perfctr -c 0-3 -g FLOPS_DP -S 2s

       This  will  start  the counters and read them out after 2s on physical cores 0-3 and write
       the results to stdout. The processes are not pinned to the CPUs 0-3.

AUTHOR

       Written by Jan Treibig <jan.treibig@gmail.com>.

BUGS

       Report Bugs on <http://code.google.com/p/likwid/issues/list>.

SEE ALSO

       likwid-topology(1), likwid-features(1), likwid-pin(1), likwid-bench(1)