Provided by: likwid_4.3.1+dfsg1-1_amd64 

NAME
likwid-bench - low-level benchmark suite and microbenchmarking framework
SYNOPSIS
likwid-bench [-hap] [-t <testname>] [-s <min_time>] [-w <workgroup_expression>] [-l <testname>] [-d
<delimiter>] [-i <iterations>]
DESCRIPTION
likwid-bench is a benchmark suite for low-level (assembly) benchmarks to measure bandwidths and
instruction throughput for specific instruction code on x86 systems. The currently included benchmark
codes include common data access patterns like load and store but also calculations like vector triad and
sum. likwid-bench includes architecture specific benchmarks for x86, x86_64 and x86 for Intel Xeon Phi
coprocessors. The performance values can either be calculated by likwid-bench or measured using
performance counters by using likwid-perfctr as a wrapper to likwid-bench. This requires to build
likwid-bench with instrumentation enabled in config.mk.
OPTIONS
-h prints a help message to standard output, then exits.
-a list available benchmark codes for the current system.
-p list available thread domains.
-s <min_time>
Run the benchmark for at least <min_time> seconds. The amount of iterations is determined using
this value. Default: 1 second.
-t <testname>
Name of the benchmark code to run (mandatory).
-w <workgroup_expression>
Specify the affinity domain, thread count and data set size for the current benchmarking run
(mandatory).
-l <testname>
list properties of a benchmark code.
-i <iterations>
Set the number of iterations per thread (optional)
WORKGROUP SYNTAX
<thread_domain>:<size> [:<num_threads>[:<chunk_size>:<stride>]] [-<streamId>:<domain_id>] with size in
kB, MB or GB. The <thread_domain> defines where the threads are placed. <size> is the total data set
size for the benchmark, the allocated vectors in memory sum up to this size. <num_threads> specifies how
many threads are used in the <thread_domain>. Threads are always placed using a compact policy in
likwid-bench. This means that per default all SMT threads are used. Optionally similar a the expression
based syntax in likwid-pin a <chunk_size> and <stride> can be provided. Optionally for every stream
(array, vector) the placement can be controlled. Per default all arrays are placed in the same
<thread_domain> the threads are running in. To place the data in a different domain for every stream of a
benchmark case (the total number of streams can be acquired by the -l option) the domain to place the
data in can be specified. Multiple streams are comma separated. Either the placement is provided or all
streams have to be explicitly placed. Please refer to the Wiki pages on
http://code.google.com/p/likwid/wiki/LikwidBench for further details and examples on usage.
EXAMPLE
1. Run the copy benchmark on socket 0 ( S0 ) with a total data set size of 100kB.
likwid-bench -t copy -w S0:100kB
Since no <num_threads> is given in the workload expression, each core of socket 0 gets one thread. The
workload is split up between all threads and the number of iterations is determined automatically.
2. Run the triad benchmark code with explicitly 100 iterations per thread with 2 threads on the socket 0
( S0 ) and a data size of 1GB.
likwid-bench -t triad -i 100 -w S0:1GB:2:1:2
Assuming socket 0 ( S0 ) has 2 physical cores with SMT enabled, hence in total 4 hardware threads, one
thread is assigned to each physical core of socket 0.
3. Run the update benchmark on socket 0 ( S0 ) with a workload of 100kB and on socket 1 ( S1 ) with the
same workload.
likwid-bench -t update -w S0:100kB -w S1:100kB
The results of both workgroups are combinded for the output. Hence the workload in each workgroup
expression should have the same size.
4. Run the copy benchmark but measure the memory traffic with likwid-perfctr. The option
INSTRUMENT_BENCH in config.mk needs to be true at compile time to use that feature.
likwid-perfctr -c E:S0:4 -g MEM -m likwid-bench -t update -w S0:100kB
likwid-perfctr will configure and start the performance counters on socket 0 ( S0 ) with 4 threads prior
to the execution of likwid-bench. The performance counters are read right before and after running the
benchmarking code to minimize the interferences of the measurement.
5. Run the copy benchmark and place the data on another socket
likwid-bench -t copy -w S0:1GB:10:1:2-0:S1,1:S1
Stream id 0 and 1 are placed in thread domains S1, which is socket 1. This can be verified as the
initialization threads output where they are running.
AUTHOR
Written by Thomas Roehl <thomas.roehl@googlemail.com>.
BUGS
Report Bugs on <https://github.com/RRZE-HPC/likwid/issues>.
SEE ALSO
likwid-perfctr(1), likwid-pin(1), likwid-topology(1), likwid-setFrequencies(1)
likwid-4 22.12.2017 LIKWID-BENCH(1)