Provided by: likwid_4.3.3+dfsg1-1_amd64 bug


       likwid-bench - low-level benchmark suite and microbenchmarking framework


       likwid-bench  [-hap]  [-t  <testname>]  [-s  <min_time>]  [-w  <workgroup_expression>] [-l
       <testname>] [-d <delimiter>] [-i <iterations>]


       likwid-bench  is  a  benchmark  suite  for  low-level  (assembly)  benchmarks  to  measure
       bandwidths  and  instruction  throughput for specific instruction code on x86 systems. The
       currently included benchmark codes include common data access patterns like load and store
       but  also  calculations  like  vector  triad  and sum.  likwid-bench includes architecture
       specific benchmarks for  x86,  x86_64  and  x86  for  Intel  Xeon  Phi  coprocessors.  The
       performance  values can either be calculated by likwid-bench or measured using performance
       counters by using likwid-perfctr as a wrapper to likwid-bench.   This  requires  to  build
       likwid-bench with instrumentation enabled in


       -h     prints a help message to standard output, then exits.

       -a     list available benchmark codes for the current system.

       -p     list available thread domains.

       -s <min_time>
              Run  the  benchmark  for  at least <min_time> seconds.  The amount of iterations is
              determined using this value. Default: 1 second.

       -t <testname>
              Name of the benchmark code to run (mandatory).

       -w <workgroup_expression>
              Specify the affinity domain, thread  count  and  data  set  size  for  the  current
              benchmarking run (mandatory).

       -l <testname>
              list properties of a benchmark code.

       -i <iterations>
              Set the number of iterations per thread (optional)


       <thread_domain>:<size>  [:<num_threads>[:<chunk_size>:<stride>]] [-<streamId>:<domain_id>]
       with size in kB, MB or GB. The <thread_domain>  defines  where  the  threads  are  placed.
       <size>  is  the total data set size for the benchmark, the allocated vectors in memory sum
       up  to  this  size.   <num_threads>  specifies  how  many  threads   are   used   in   the
       <thread_domain>.   Threads are always placed using a compact policy in likwid-bench.  This
       means that per default all SMT threads are used. Optionally similar a the expression based
       syntax  in  likwid-pin  a  <chunk_size> and <stride> can be provided. Optionally for every
       stream (array, vector) the placement can be controlled. Per default all arrays are  placed
       in  the  same <thread_domain> the threads are running in. To place the data in a different
       domain for every stream of a benchmark case (the total number of streams can  be  acquired
       by  the  -l option) the domain to place the data in can be specified. Multiple streams are
       comma separated. Either the placement is provided or all streams  have  to  be  explicitly
       placed. Please refer to the Wiki pages on
       for further details and examples on usage.


       1.  Run the copy benchmark on socket 0 ( S0 ) with a total data set size of 100kB.

       likwid-bench -t copy -w S0:100kB

       Since no <num_threads> is given in the workload expression, each core of socket 0 gets one
       thread.  The  workload  is  split  up  between all threads and the number of iterations is
       determined automatically.

       2.  Run the triad benchmark code with explicitly 100 iterations per thread with 2  threads
           on the socket 0 ( S0 ) and a data size of 1GB.

       likwid-bench -t triad -i 100 -w S0:1GB:2:1:2

       Assuming  socket 0 ( S0 ) has 2 physical cores with SMT enabled, hence in total 4 hardware
       threads, one thread is assigned to each physical core of socket 0.

       3.  Run the update benchmark on socket 0 ( S0 ) with a workload of 100kB and on socket 1 (
           S1 ) with the same workload.

       likwid-bench -t update -w S0:100kB -w S1:100kB

       The  results  of  both workgroups are combinded for the output. Hence the workload in each
       workgroup expression should have the same size.

       4.  Run the copy benchmark but measure the memory traffic with likwid-perfctr.  The option
           INSTRUMENT_BENCH in needs to be true at compile time to use that feature.

       likwid-perfctr -c E:S0:4 -g MEM -m likwid-bench -t update -w S0:100kB

       likwid-perfctr will configure and start the performance counters on socket 0 ( S0 ) with 4
       threads prior to the execution of likwid-bench.  The performance counters are  read  right
       before  and  after  running  the  benchmarking  code  to minimize the interferences of the

       5.  Run the copy benchmark and place the data on another socket

       likwid-bench -t copy -w S0:1GB:10:1:2-0:S1,1:S1

       Stream id 0 and 1 are placed in thread domains S1, which is socket 1. This can be verified
       as the initialization threads output where they are running.


       Written by Thomas Roehl <>.


       Report Bugs on <>.


       likwid-perfctr(1), likwid-pin(1), likwid-topology(1), likwid-setFrequencies(1)