Ubuntu Manpage: likwid-pin - pin a sequential or threaded application to dedicated processors

NAME

       likwid-pin - pin a sequential or threaded application to dedicated processors

SYNOPSIS

       likwid-pin [-vhSpqim] [-V <verbosity>] [-c/-C <corelist>] [-s <skip_mask>] [-d <delim>]

DESCRIPTION

likwid-pin is a command line application to pin a sequential or multithreaded application
to dedicated processors. It can be used as replacement for taskset. Opposite to taskset
no affinity mask but single processors are specified. For multithreaded applications
based on the pthread library the pthread_create library call is overloaded through
LD_PRELOAD and each created thread is pinned to a dedicated processor as specified in
core_list .

Per default every generated thread is pinned to the core in the order of calls to
pthread_create it is possible to skip single threads.

The OpenMP implementations of GCC and ICC compilers are explicitly supported. Clang's
OpenMP backend should also work as it is built on top of Intel's OpenMP runtime library.
Others may also work likwid-pin sets the environment variable OMP_NUM_THREADS for you if
not already present. It will set as many threads as present in the pin expression. Be
aware that with pthreads the parent thread is always pinned. If you create for example 4
threads with pthread_create and do not use the parent process as worker you still have to
provide num_threads+1 processor ids.

likwid-pin supports different numberings for pinning. See section CPU EXPRESSION for
details.

For applications where first touch policy on NUMA systems cannot be employed likwid-pin
can be used to turn on interleave memory placement. This can significantly speed up the
performance of memory bound multithreaded codes. All NUMA nodes the user pinned threads to
are used for interleaving.

OPTIONS

       -h,--help
              prints a help message to standard output, then exits.

       -v,--version
              prints version information to standard output, then exits.

       -V, --verbose <level>
              verbose   output  during  execution  for  debugging.  0  for  only  errors,  1  for
              informational output, 2 for detailed output and 3 for developer output

       -c,-C <cpu expression>
              specify a numerical list of processors.  The  list  may  contain  multiple   items,
              separated  by  comma, and ranges. For example 0,3,9-11. Other format are available,
              see the CPU EXPRESSION section.

       -s, --skip <skip_mask>
              Specify skip mask as HEX number. For each  set  bit  the  corresponding  thread  is
              skipped.

       -S,--sweep
              All  ccNUMA  memory  domains belonging to the specified thread list will be cleaned
              before the run. Can solve file buffer cache problems on Linux.

       -p     prints the available thread domains for logical pinning

       -i     set NUMA memory policy to interleave involving all NUMA nodes involved in pinning

       -m     set NUMA memory policy to membind involving all NUMA nodes involved in pinning

       -d <delim>
              usable with -p to specify the CPU delimiter in the cpulist

       -q,--quiet
              silent execution without output

CPU EXPRESSION

1. The most intuitive CPU selection method is a comma-separated list of physical CPU IDs.
An example for this is 0,2 which schedules the threads on CPU cores 0 and 2. The
physical numbering also allows the usage of ranges like 0-2 which results in the list
0,1,2.

2. The CPUs can be selected by their indices inside of an affinity domain. The affinity
domain is optional and if not given, Likwid assumes the domain 'N' for the whole node.
The format is L:<indexlist> for selecting the CPUs inside of domain 'N' or
L:<domain>:<indexlist> for selecting the CPUs inside the given domain. Assuming an
virtual affinity domain 'P' that contains the CPUs 0,4,1,5,2,6,3,7. After sorting it
to have physical cores first we get: 0,1,2,3,4,5,6,7. The logical numbering L:P:0-2
results in the selection 0,1,2 from the physical cores first list.

3. The expression syntax enables the selection according to an selection function with
variable input parameters. The format is either E:<affinity domain>:<numberOfThreads>
to use the first <numberOfThreads> threads in affinity domain <affinity domain> or
E:<affinity domain>:<numberOfThreads>:<chunksize>:<stride> to use <numberOfThreads>
threads with <chunksize> threads selected in row while skipping <stride> threads in
affinity domain <affinity domain>. Examples are E:N:4:1:2 for selecting the first four
physical CPUs on a system with 2 SMT threads per core or E:P:4:2:4 for choosing the
first two threads in affinity domain P, skipping 2 threads and selecting again two
threads. The resulting CPU list for virtual affinity domain P is 0,4,2,6

3. The last format schedules the threads not only in a single affinity domain but
distributed them evenly over all available affinity domains of the same kind. In
contrast to the other formats, the selection is done using the physical cores first
and then the SMT threads. The format is <affinity domain without number>:scatter like
M:scatter to schedule the threads evenly in all available memory affinity domains.
Assuming the two socket domains S0 = 0,4,1,5 and S1 = 2,6,3,7 the expression S:scatter
results in the CPU list 0,2,1,3,4,6,5,7

EXAMPLE

       1.   For standard pthread application:

       likwid-pin -c 0,2,4-6 ./myApp

       The  parent  process  is  pinned to processor 0 which is likely to be thread 0 in ./myApp.
       Thread 1 is pinned to processor 2, thread 2 to processor 4, thread 3 to  processor  5  and
       thread 4 to processor 6. If more threads are created than specified in the processor list,
       these threads are pinned to processor 0 as fallback.

       2.   For selection of CPUs inside of a CPUset  only  the  logical  numbering  is  allowed.
            Assuming CPUset 0,4,1,5:

       likwid-pin -c L:1,3 ./myApp

       This command pins ./myApp on CPU 4 and the thread started by ./myApp on CPU 5

       3.   A common use-case for the numbering by expression is pinning of an application on the
            Intel Xeon Phi coprocessor with its 60 cores each having 4 SMT threads.

       likwid-pin -c E:N:60:1:4 ./myApp

       This command schedules one thread per physical CPU core for ./myApp.

IMPORTANT NOTICE

       The detection of shepard threads works for Intel's/LLVM OpenMP runtime (>=12.0), for GCC's
       OpenMP  runtime  as  well  as  for  PGI's  OpenMP  runtime. If you encounter problems with
       pinning, please set a proper skip mask to skip the not-detected  shepard  threads.   Intel
       OpenMP runtime 11.0/11.1 requires to set a skip mask of 0x1.

AUTHOR

       Written by Thomas Gruber <thomas.roehl@googlemail.com>.

BUGS

       Report Bugs on <https://github.com/RRZE-HPC/likwid/issues>.