Provided by: likwid_3.1.3+dfsg1-1_amd64
NAME
likwid-pin - pin a sequential or threaded application to dedicated processors
SYNOPSIS
likwid-pin [-vhqipS] [-c <core_list>] [-s <skip_mask>] [-d <delimiter>]
DESCRIPTION
likwid-pin is a command line application to pin a sequential or multithreaded applications to dedicated processors. It can be used as replacement for taskset(1). Opposite to taskset no affinity mask but single processors are specified. For multithreaded applications based on the pthread library the pthread_create library call is overloaded through LD_PRELOAD and each created thread is pinned to a dedicated processor as specified in core_list Per default every generated thread is pinned to the core in the order of calls to pthread_create. It is possible to skip single threads using -s commandline option. For OpenMP implementations gcc and icc compilers are explicitly supported. Others may also work. likwid-pin sets the environment variable OMP_NUM_THREADS for you if not already present. It will set as many threads as present in the pin expression. Be aware that with pthreads the parent thread is always pinned. If you create for example 4 threads with pthread_create and do not use the parent process as worker you still have to provide num_threads+1 processor ids. likwid-pin supports different numberings for pinning. Per default physical numbering of the cores is used. This is the numbering also likwid-topology(1) reports. But also logical numbering inside the node or the sockets can be used. If using with a N (e.g. -c N:0-6) the cores are logical numbered over the whole node. Physical cores come first. If a system e.g. has 8 cores with 16 SMT threads with -c N:0-7 you get all physical cores. If you specify -c N:0-15 you get all physical cores and all SMT threads. With S you can specify logical numberings inside sockets, again physical cores come first. You can mix different domains separated with @. E.g. -c S0:0-3@S2:2-3 you pin thread 0-3 to logical cores 0-3 on socket 0 and threads 4-5 on logical cores 2-3 on socket 2. For applications where first touch policy on numa systems cannot be employed likwid-pin can be used to turn on interleave memory placement. This can significantly speed up the performance of memory bound multithreaded codes. All numa nodes the user pinned threads to are used for interleaving.
OPTIONS
-v prints version information to standard output, then exits. -h prints a help message to standard output, then exits. -c <processor_list> OR <thread_expression> OR <scatter policy> specify a numerical list of processors. The list may contain multiple items, separated by comma, and ranges. For example 0,3,9-11. You can also use logical numberings, either within a node (N), a socket (S<id>) or a numa domain (M<id>). likwid-pin also supports logical pinning within a cpuset with a L prefix. If you ommit this option likwid-pin will pin the threads to the processors on the node with physical cores first. See below for details on using a thread expression or scatter policy -s <skip_mask> Specify skip mask as HEX number. For each set bit the corresponding thread is skipped. -S All ccNUMA memory domains belonging to the specified threadlist will be cleaned before the run. Can solve file buffer cache problems on Linux. -p prints the available thread domains for logical pinning. If used in combination with -c, the physical processor IDs are printed to stdout. -i set numa memory policy to interleave spanning all numa nodes involved in pinning -q silent execution without output -d <delimiter> set delimiter used to output the physical processor list (-p & -c)
EXAMPLE
1. For standard pthread application: likwid-pin -c 0,2,4-6 ./myApp The parent process is pinned to processor 0. Thread 0 to processor 2, thread 1 to processor 4, thread 2 to processor 5 and thread 3 to processor 6. If more threads are created than specified in the processor list, these threads are pinned to processor 0 as fallback. 2. For gcc OpenMP as many ids must be specified in processor list as there are threads: OMP_NUM_THREADS=4; likwid-pin -c 0,2,1,3 ./myApp 3. Full control over the pinning can be achieved by specifying a skip mask. For example the following command skips the pinning of thread 1: OMP_NUM_THREADS=4; likwid-pin -s 0x1 -c 0,2,1,3 ./myApp 4. The -c switch supports the definition of threads in a specific affinity domain like NUMA node or cache group. The available affinity domains can be retrieved with the -p switch and no further option on the commandline. The common affinity domains are N (whole Node), SX (socket X), CX (cache group X) and MX (memory group X). Multiple affinity domains can be set separated by @. In order to pin 2 threads on each socket of a 2-socket system: OMP_NUM_THREADS=4; likwid-pin -c S0:0-1@S1:0-1 ./myApp 5. Another argument definition of the -c switch allows the threads to be pinned according to an expression like E:N:4:1:2. The syntax is E:<thread domain>:<number of threads>(:<chunk size>:<stride>). The example pins 8 threads with 2 SMT threads per core on a SMT 4 machine: OMP_NUM_THREADS=4; likwid-pin -c E:N:8:2:4 ./myApp 6. The last alternative for the -c switch is the automatic scattering of threads on affinity domains. For example to scatter the threads over all memory domains in a system: OMP_NUM_THREADS=4; likwid-pin -c M:scatter ./myApp
AUTHOR
Written by Jan Treibig <jan.treibig@gmail.com>.
BUGS
Report Bugs on <http://code.google.com/p/likwid/issues/list>.
SEE ALSO
taskset(1), likwid-perfctr(1), likwid-features(1), likwid-powermeter(1), likwid- setFrequencies(1)