Ubuntu Manpage: likwid-bench - low-level benchmark suite and microbenchmarking framework

name
synopsis
description
options
workgroup syntax
example
author
bugs
see also

NAME

       likwid-bench - low-level benchmark suite and microbenchmarking framework

SYNOPSIS

       likwid-bench  [-hap]  [-l  <testname>]  [-i <iterations>] [-g <number_of_workgroups>] [-t <testname>] [-w
       <workgroup_expression>]

DESCRIPTION

       likwid-bench is a  benchmark  suite  for  low-level  (assembly)  benchmarks  to  measure  bandwidths  and
       instruction  throughput  for  specific  instruction code on x86 systems. The currently included benchmark
       codes include common data access patterns like load and store but also calculations like vector triad and
       sum.   likwid-bench  includes architecture specific benchmarks for x86, x86_64 and x86 for Intel Xeon Phi
       coprocessors. The performance  values  can  either  be  calculated  by  likwid-bench  or  measured  using
       performance  counters  by  using.   likwid-perfctr  as a wrapper to likwid-bench.  This requires to build
       likwid-bench.  with Instrumentation which can be enabled in config.mk.

OPTIONS

       -h     prints a help message to standard output, then exits.

       -a     list available benchmark codes for the current system.

       -p     list available thread domains.

       -l  <testname>
              list properties of a benchmark code.

       -i  <iterations>
              number of iterations to perform inside the benchmark code.

       -t  <testname>
              Name of the benchmark code to run (mandatory).

       -g  <number_of_workgroups>
              specify the number of workgroups to perform the benchmark code on (mandatory).

       -w  <workgroup_expression>
              Specify the affinity domain, thread count and data set  size  for  the  current  benchmarking  run
              (mandatory).

WORKGROUP SYNTAX

       <thread_domain>:<size>  [:<num_threads>[:<chunk_size>:<stride>]]  [-<streamId>:<domain_id>]  with size in
       kB, MB or GB. Where thread domain is where threads are placed. Size is the total data set  size  for  the
       benchmark.  num_threads  specifies  how  many threads are used. Threads are always placed using a compact
       policy in likwid-bench.  This means that per default all SMT threads are used. Optionally similar  a  the
       expression  based  syntax  in  likwid-pin  a  chunk size and stride can be provided. Optionally for every
       stream means array the placement can be controlled. Per default all arrays are placed in the same  thread
       domain  the  threads  are  running  in.  To  place  the  data in a different domain for every stream of a
       benchmark case (the total number of streams  can be aquired by the -l option) the  domain  to  place  the
       data  in  can be specified. Multiple streams are comma separated. Either the placement is provided or all
       streams   have   to    be    explicitly    placed.    Please    refer    to    the    Wiki    pages    on
       http://code.google.com/p/likwid/wiki/LikwidBench for further details and examples on usage.

EXAMPLE

1. Run the copy benchmark with 1000 iterations on socket 0 with a total data set size of 100kB.

likwid-bench -t copy -i 1000 -g 1 -w S0:100kB

Since no num_thread is given in the workload expression, each core of socket 0 gets one thread. The
workload is split up between all threads.

2. Run the triad benchmark code with 100 iterations with 2 threads on the socket 0 and a data size of 1
GB.

likwid-bench -t triad -i 100 -g 1 -w S0:1GB:2:1:2

Assuming socket 0 has 4 SMT threads, one thread is assigned to each physical core of socket 0.

3. Run the update benchmark with 1000 iterations on socket 0 with a workload of 100kB and on socket 1
with the same workload.

likwid-bench -t update -i 1000 -g 2 -w S0:100kB -w S1:100kB

The results of both workgroups are combinded for the output. Hence the workload in each workgroup
expression should have the same size.

4. Run the copy benchmark but measure the memory traffic with likwid-perfctr. The option
INSTRUMENT_BENCH in config.mk needs to be true at compile time to use that feature.

likwid-perfctr -C E:S0:4 -g MEM -m likwid-bench -t update -i 1000 -g 1 -w S0:100kB

likwid-perfctr will configure and start the performance counters on socket 0 with 4 threads prior to the
execution of likwid-bench. The performance counters are read right before and after running the
benchmarking code to minimize the interferences of the measurement.

5. Run the copy benchmark and place the data on other socket

likwid-bench -t copy -i 50 -g 1 -w S0:1GB:10:1:2-0:S1,1:S1

Stream id 0 and 1 are placed in thread domains S1, which is socket 1. This can be verified as the
initialization threads output where they are running.

AUTHOR

       Written by Jan Treibig <jan.treibig@gmail.com>.

BUGS

       Report Bugs on <http://code.google.com/p/likwid/issues/list>.