Provided by: likwid_3.1.3+dfsg1-1_amd64 bug

NAME

       likwid-bench - low-level benchmark suite and microbenchmarking framework

SYNOPSIS

       likwid-bench  [-hap]  [-l  <testname>]  [-i <iterations>] [-g <number_of_workgroups>] [-t <testname>] [-w
       <workgroup_expression>]

DESCRIPTION

       likwid-bench is a  benchmark  suite  for  low-level  (assembly)  benchmarks  to  measure  bandwidths  and
       instruction  throughput  for  specific  instruction code on x86 systems. The currently included benchmark
       codes include common data access patterns like load and store but also calculations like vector triad and
       sum.  likwid-bench includes architecture specific benchmarks for x86, x86_64 and x86 for Intel  Xeon  Phi
       coprocessors.  The  performance  values  can  either  be  calculated  by  likwid-bench  or measured using
       performance counters by using.  likwid-perfctr as a wrapper to  likwid-bench.   This  requires  to  build
       likwid-bench.  with Instrumentation which can be enabled in config.mk.

OPTIONS

       -h     prints a help message to standard output, then exits.

       -a     list available benchmark codes for the current system.

       -p     list available thread domains.

       -l  <testname>
              list properties of a benchmark code.

       -i  <iterations>
              number of iterations to perform inside the benchmark code.

       -t  <testname>
              Name of the benchmark code to run (mandatory).

       -g  <number_of_workgroups>
              specify the number of workgroups to perform the benchmark code on (mandatory).

       -w  <workgroup_expression>
              Specify  the  affinity  domain,  thread  count  and data set size for the current benchmarking run
              (mandatory).

WORKGROUP SYNTAX

       <thread_domain>:<size> [:<num_threads>[:<chunk_size>:<stride>]] [-<streamId>:<domain_id>]  with  size  in
       kB,  MB  or  GB. Where thread domain is where threads are placed. Size is the total data set size for the
       benchmark. num_threads specifies how many threads are used. Threads are always  placed  using  a  compact
       policy  in  likwid-bench.  This means that per default all SMT threads are used. Optionally similar a the
       expression based syntax in likwid-pin a chunk size and stride  can  be  provided.  Optionally  for  every
       stream  means array the placement can be controlled. Per default all arrays are placed in the same thread
       domain the threads are running in. To place the data  in  a  different  domain  for  every  stream  of  a
       benchmark  case  (the  total  number of streams  can be aquired by the -l option) the domain to place the
       data in can be specified. Multiple streams are comma separated. Either the placement is provided  or  all
       streams    have    to    be    explicitly    placed.    Please    refer    to    the    Wiki   pages   on
       http://code.google.com/p/likwid/wiki/LikwidBench for further details and examples on usage.

EXAMPLE

       1.  Run the copy benchmark with 1000 iterations on socket 0 with a total data set size of 100kB.

       likwid-bench -t copy -i 1000 -g 1 -w S0:100kB

       Since no num_thread is given in the workload expression, each core of  socket  0  gets  one  thread.  The
       workload is split up between all threads.

       2.  Run  the triad benchmark code with 100 iterations with 2 threads on the socket 0 and a data size of 1
           GB.

       likwid-bench -t triad -i 100 -g 1 -w S0:1GB:2:1:2

       Assuming socket 0 has 4 SMT threads, one thread is assigned to each physical core of socket 0.

       3.  Run the update benchmark with 1000 iterations on socket 0 with a workload of 100kB and  on  socket  1
           with the same workload.

       likwid-bench -t update -i 1000 -g 2 -w S0:100kB -w S1:100kB

       The  results  of  both  workgroups  are  combinded  for  the output. Hence the workload in each workgroup
       expression should have the same size.

       4.  Run  the  copy  benchmark  but  measure  the  memory  traffic  with   likwid-perfctr.    The   option
           INSTRUMENT_BENCH in config.mk needs to be true at compile time to use that feature.

       likwid-perfctr -C E:S0:4 -g MEM -m likwid-bench -t update -i 1000 -g 1 -w S0:100kB

       likwid-perfctr  will configure and start the performance counters on socket 0 with 4 threads prior to the
       execution of likwid-bench.  The performance  counters  are  read  right  before  and  after  running  the
       benchmarking code to minimize the interferences of the measurement.

       5.  Run the copy benchmark and place the data on other socket

       likwid-bench -t copy -i 50 -g 1 -w S0:1GB:10:1:2-0:S1,1:S1

       Stream  id  0  and  1  are  placed  in  thread domains S1, which is socket 1. This can be verified as the
       initialization threads output where they are running.

AUTHOR

       Written by Jan Treibig <jan.treibig@gmail.com>.

BUGS

       Report Bugs on <http://code.google.com/p/likwid/issues/list>.

SEE ALSO

       likwid-perfctr(1), likwid-pin(1), likwid-topology(1), likwid-features(1), likwid-setFrequencies(1)

likwid-3                                            12.2.2014                                    LIKWID-BENCH(1)