Provided by: likwid_3.1.3+dfsg1-1_amd64 bug

NAME

       likwid-bench - low-level benchmark suite and microbenchmarking framework

SYNOPSIS

       likwid-bench  [-hap]  [-l  <testname>]  [-i  <iterations>] [-g <number_of_workgroups>] [-t
       <testname>] [-w <workgroup_expression>]

DESCRIPTION

       likwid-bench  is  a  benchmark  suite  for  low-level  (assembly)  benchmarks  to  measure
       bandwidths  and  instruction  throughput for specific instruction code on x86 systems. The
       currently included benchmark codes include common data access patterns like load and store
       but  also  calculations  like  vector  triad  and sum.  likwid-bench includes architecture
       specific benchmarks for  x86,  x86_64  and  x86  for  Intel  Xeon  Phi  coprocessors.  The
       performance  values can either be calculated by likwid-bench or measured using performance
       counters by using.  likwid-perfctr as a wrapper to likwid-bench.  This requires  to  build
       likwid-bench.  with Instrumentation which can be enabled in config.mk.

OPTIONS

       -h     prints a help message to standard output, then exits.

       -a     list available benchmark codes for the current system.

       -p     list available thread domains.

       -l  <testname>
              list properties of a benchmark code.

       -i  <iterations>
              number of iterations to perform inside the benchmark code.

       -t  <testname>
              Name of the benchmark code to run (mandatory).

       -g  <number_of_workgroups>
              specify the number of workgroups to perform the benchmark code on (mandatory).

       -w  <workgroup_expression>
              Specify  the  affinity  domain,  thread  count  and  data  set size for the current
              benchmarking run (mandatory).

WORKGROUP SYNTAX

       <thread_domain>:<size> [:<num_threads>[:<chunk_size>:<stride>]]  [-<streamId>:<domain_id>]
       with  size  in  kB, MB or GB. Where thread domain is where threads are placed. Size is the
       total data set size for the benchmark. num_threads specifies how many  threads  are  used.
       Threads  are  always  placed  using a compact policy in likwid-bench.  This means that per
       default all SMT threads are used. Optionally similar a  the  expression  based  syntax  in
       likwid-pin  a  chunk  size  and  stride can be provided. Optionally for every stream means
       array the placement can be controlled. Per default all  arrays  are  placed  in  the  same
       thread  domain  the  threads  are  running in. To place the data in a different domain for
       every stream of a benchmark case (the total number of streams  can be aquired  by  the  -l
       option)  the  domain  to  place  the  data in can be specified. Multiple streams are comma
       separated. Either the placement is provided or all streams have to be  explicitly  placed.
       Please  refer  to  the  Wiki pages on http://code.google.com/p/likwid/wiki/LikwidBench for
       further details and examples on usage.

EXAMPLE

       1.  Run the copy benchmark with 1000 iterations on socket 0 with a total data set size  of
           100kB.

       likwid-bench -t copy -i 1000 -g 1 -w S0:100kB

       Since  no  num_thread  is given in the workload expression, each core of socket 0 gets one
       thread. The workload is split up between all threads.

       2.  Run the triad benchmark code with 100 iterations with 2 threads on the socket 0 and  a
           data size of 1 GB.

       likwid-bench -t triad -i 100 -g 1 -w S0:1GB:2:1:2

       Assuming  socket  0  has  4  SMT  threads, one thread is assigned to each physical core of
       socket 0.

       3.  Run the update benchmark with 1000 iterations on socket 0 with a workload of 100kB and
           on socket 1 with the same workload.

       likwid-bench -t update -i 1000 -g 2 -w S0:100kB -w S1:100kB

       The  results  of  both workgroups are combinded for the output. Hence the workload in each
       workgroup expression should have the same size.

       4.  Run the copy benchmark but measure the memory traffic with likwid-perfctr.  The option
           INSTRUMENT_BENCH in config.mk needs to be true at compile time to use that feature.

       likwid-perfctr -C E:S0:4 -g MEM -m likwid-bench -t update -i 1000 -g 1 -w S0:100kB

       likwid-perfctr  will  configure  and  start  the  performance  counters on socket 0 with 4
       threads prior to the execution of likwid-bench.  The performance counters are  read  right
       before  and  after  running  the  benchmarking  code  to minimize the interferences of the
       measurement.

       5.  Run the copy benchmark and place the data on other socket

       likwid-bench -t copy -i 50 -g 1 -w S0:1GB:10:1:2-0:S1,1:S1

       Stream id 0 and 1 are placed in thread domains S1, which is socket 1. This can be verified
       as the initialization threads output where they are running.

AUTHOR

       Written by Jan Treibig <jan.treibig@gmail.com>.

BUGS

       Report Bugs on <http://code.google.com/p/likwid/issues/list>.

SEE ALSO

       likwid-perfctr(1),    likwid-pin(1),   likwid-topology(1),   likwid-features(1),   likwid-
       setFrequencies(1)