Provided by: likwid_3.1.3+dfsg1-1_amd64 bug

NAME

       likwid-pin - pin a sequential or threaded application to dedicated processors

SYNOPSIS

       likwid-pin [-vhqipS] [-c <core_list>] [-s <skip_mask>] [-d <delimiter>]

DESCRIPTION

       likwid-pin is a command line application to pin a sequential or multithreaded applications
       to dedicated processors. It can be  used  as  replacement  for  taskset(1).   Opposite  to
       taskset  no  affinity  mask  but  single  processors  are  specified.   For  multithreaded
       applications based on the pthread library the pthread_create library  call  is  overloaded
       through LD_PRELOAD and each created thread is pinned to a dedicated processor as specified
       in core_list

       Per default every generated thread is pinned  to  the  core  in  the  order  of  calls  to
       pthread_create.  It is possible to skip single threads using -s commandline option.

       For OpenMP implementations gcc and icc compilers are explicitly supported. Others may also
       work.  likwid-pin sets the environment variable OMP_NUM_THREADS for  you  if  not  already
       present.   It  will  set  as many threads as present in the pin expression.  Be aware that
       with pthreads the parent thread is always pinned. If you create for example 4 threads with
       pthread_create  and  do  not  use  the  parent process as worker you still have to provide
       num_threads+1 processor ids.

       likwid-pin supports different numberings for pinning. Per default  physical  numbering  of
       the  cores  is  used.   This  is  the  numbering also likwid-topology(1) reports. But also
       logical numbering inside the node or the sockets can be used.  If using with a N (e.g.  -c
       N:0-6)  the cores are logical numbered over the whole node.  Physical cores come first. If
       a system e.g. has 8 cores with 16 SMT threads with -c N:0-7 you get  all  physical  cores.
       If  you  specify  -c N:0-15 you get all physical cores and all SMT threads. With S you can
       specify logical numberings inside sockets, again physical cores come first.  You  can  mix
       different  domains  separated  with @. E.g. -c S0:0-3@S2:2-3 you pin thread 0-3 to logical
       cores 0-3 on socket 0 and threads 4-5 on logical cores 2-3 on socket 2.

       For applications where first touch policy on numa systems cannot  be  employed  likwid-pin
       can  be  used  to turn on interleave memory placement. This can significantly speed up the
       performance of memory bound multithreaded codes. All numa nodes the user pinned threads to
       are used for interleaving.

OPTIONS

       -v     prints version information to standard output, then exits.

       -h     prints a help message to standard output, then exits.

       -c  <processor_list> OR <thread_expression> OR <scatter policy>
              specify  a  numerical  list  of  processors.  The  list may contain multiple items,
              separated by comma, and ranges. For example 0,3,9-11.  You  can  also  use  logical
              numberings,  either  within  a node (N), a socket (S<id>) or a numa domain (M<id>).
              likwid-pin also supports logical pinning within a cpuset with a L  prefix.  If  you
              ommit  this  option  likwid-pin  will pin the threads to the processors on the node
              with physical cores first.  See below for details on using a thread  expression  or
              scatter policy

       -s  <skip_mask>
              Specify  skip  mask  as  HEX  number.  For each set bit the corresponding thread is
              skipped.

       -S     All ccNUMA memory domains belonging to the specified  threadlist  will  be  cleaned
              before the run. Can solve file buffer cache problems on Linux.

       -p     prints  the  available  thread  domains for logical pinning. If used in combination
              with -c, the physical processor IDs are printed to stdout.

       -i     set numa memory policy to interleave spanning all numa nodes involved in pinning

       -q     silent execution without output

       -d  <delimiter>
              set delimiter used to output the physical processor list (-p & -c)

EXAMPLE

       1.  For standard pthread application:

       likwid-pin -c 0,2,4-6 ./myApp

       The parent process is pinned to processor  0.  Thread  0  to  processor  2,  thread  1  to
       processor  4,  thread  2  to  processor 5 and thread 3 to processor 6. If more threads are
       created than specified in the processor list, these threads are pinned to processor  0  as
       fallback.

       2.  For gcc OpenMP as many ids must be specified in processor list as there are threads:

       OMP_NUM_THREADS=4; likwid-pin -c 0,2,1,3 ./myApp

       3.  Full  control over the pinning can be achieved by specifying a skip mask.  For example
           the following command skips the pinning of thread 1:

       OMP_NUM_THREADS=4; likwid-pin -s 0x1 -c 0,2,1,3 ./myApp

       4.  The -c switch supports the definition of threads in a specific  affinity  domain  like
           NUMA  node or cache group. The available affinity domains can be retrieved with the -p
           switch and no further option on the commandline. The common  affinity  domains  are  N
           (whole  Node),  SX  (socket  X),  CX (cache group X) and MX (memory group X). Multiple
           affinity domains can be set separated by @. In order to pin 2 threads on  each  socket
           of a 2-socket system:

       OMP_NUM_THREADS=4; likwid-pin -c S0:0-1@S1:0-1 ./myApp

       5.  Another argument definition of the -c switch allows the threads to be pinned according
           to  an  expression  like  E:N:4:1:2.  The  syntax  is  E:<thread  domain>:<number   of
           threads>(:<chunk  size>:<stride>).   The example pins 8 threads with 2 SMT threads per
           core on a SMT 4 machine:

       OMP_NUM_THREADS=4; likwid-pin -c E:N:8:2:4 ./myApp

       6.  The last alternative for the -c switch is  the  automatic  scattering  of  threads  on
           affinity  domains.   For  example  to scatter the threads over all memory domains in a
           system:

       OMP_NUM_THREADS=4; likwid-pin -c M:scatter ./myApp

AUTHOR

       Written by Jan Treibig <jan.treibig@gmail.com>.

BUGS

       Report Bugs on <http://code.google.com/p/likwid/issues/list>.

SEE ALSO

       taskset(1),   likwid-perfctr(1),   likwid-features(1),    likwid-powermeter(1),    likwid-
       setFrequencies(1)