Provided by: likwid_5.0.1+dfsg1-1_amd64 bug

NAME

       likwid-pin - pin a sequential or threaded application to dedicated processors

SYNOPSIS

       likwid-pin [-vhSpqim] [-V <verbosity>] [-c/-C <corelist>] [-s <skip_mask>] [-d <delim>]

DESCRIPTION

       likwid-pin  is a command line application to pin a sequential or multithreaded application
       to dedicated processors. It can be used as replacement for taskset.  Opposite  to  taskset
       no  affinity  mask  but  single  processors are specified.  For multithreaded applications
       based on the pthread  library  the  pthread_create  library  call  is  overloaded  through
       LD_PRELOAD  and  each  created  thread  is pinned to a dedicated processor as specified in
       core_list .

       Per default every generated thread is pinned  to  the  core  in  the  order  of  calls  to
       pthread_create it is possible to skip single threads.

       The  OpenMP  implementations  of  GCC and ICC compilers are explicitly supported.  Clang's
       OpenMP backend should also work as it is built on top of Intel's OpenMP  runtime  library.
       Others  may  also work likwid-pin sets the environment variable OMP_NUM_THREADS for you if
       not already present.  It will set as many threads as present in  the  pin  expression.  Be
       aware  that  with pthreads the parent thread is always pinned. If you create for example 4
       threads with pthread_create and do not use the parent process as worker you still have  to
       provide num_threads+1 processor ids.

       likwid-pin  supports  different  numberings  for  pinning.  See section CPU EXPRESSION for
       details.

       For applications where first touch policy on NUMA systems cannot  be  employed  likwid-pin
       can  be  used  to turn on interleave memory placement. This can significantly speed up the
       performance of memory bound multithreaded codes. All NUMA nodes the user pinned threads to
       are used for interleaving.

OPTIONS

       -h,--help
              prints a help message to standard output, then exits.

       -v,--version
              prints version information to standard output, then exits.

       -V, --verbose <level>
              verbose   output  during  execution  for  debugging.  0  for  only  errors,  1  for
              informational output, 2 for detailed output and 3 for developer output

       -c,-C <cpu expression>
              specify a numerical list of processors.  The  list  may  contain  multiple   items,
              separated  by  comma, and ranges. For example 0,3,9-11. Other format are available,
              see the CPU EXPRESSION section.

       -s, --skip <skip_mask>
              Specify skip mask as HEX number. For each  set  bit  the  corresponding  thread  is
              skipped.

       -S,--sweep
              All  ccNUMA  memory  domains belonging to the specified thread list will be cleaned
              before the run. Can solve file buffer cache problems on Linux.

       -p     prints the available thread domains for logical pinning

       -i     set NUMA memory policy to interleave involving all NUMA nodes involved in pinning

       -m     set NUMA memory policy to membind involving all NUMA nodes involved in pinning

       -d <delim>
              usable with -p to specify the CPU delimiter in the cpulist

       -q,--quiet
              silent execution without output

CPU EXPRESSION

       1.  The most intuitive CPU selection method is a comma-separated list of physical CPU IDs.
           An  example  for  this  is  0,2 which schedules the threads on CPU cores 0 and 2.  The
           physical numbering also allows the usage of ranges like 0-2 which results in the  list
           0,1,2.

       2.  The  CPUs  can be selected by their indices inside of an affinity domain. The affinity
           domain is optional and if not given, Likwid assumes the domain 'N' for the whole node.
           The  format  is  L:<indexlist>  for  selecting  the  CPUs  inside  of  domain  'N'  or
           L:<domain>:<indexlist> for selecting the CPUs inside the  given  domain.  Assuming  an
           virtual  affinity domain 'P' that contains the CPUs 0,4,1,5,2,6,3,7.  After sorting it
           to have physical cores first we get: 0,1,2,3,4,5,6,7.  The logical  numbering  L:P:0-2
           results in the selection 0,1,2 from the physical cores first list.

       3.  The  expression  syntax  enables the selection according to an selection function with
           variable input parameters. The format is either E:<affinity  domain>:<numberOfThreads>
           to  use  the  first  <numberOfThreads> threads in affinity domain <affinity domain> or
           E:<affinity domain>:<numberOfThreads>:<chunksize>:<stride>  to  use  <numberOfThreads>
           threads  with  <chunksize>  threads selected in row while skipping <stride> threads in
           affinity domain <affinity domain>. Examples are E:N:4:1:2 for selecting the first four
           physical  CPUs  on  a system with 2 SMT threads per core or E:P:4:2:4 for choosing the
           first two threads in affinity domain P, skipping 2 threads  and  selecting  again  two
           threads. The resulting CPU list for virtual affinity domain P is 0,4,2,6

       3.  The  last  format  schedules  the  threads  not  only  in a single affinity domain but
           distributed them evenly over all available affinity  domains  of  the  same  kind.  In
           contrast  to  the  other formats, the selection is done using the physical cores first
           and then the SMT threads. The format is <affinity domain without number>:scatter  like
           M:scatter  to  schedule  the  threads evenly in all available memory affinity domains.
           Assuming the two socket domains S0 = 0,4,1,5 and S1 = 2,6,3,7 the expression S:scatter
           results in the CPU list 0,2,1,3,4,6,5,7

EXAMPLE

       1.   For standard pthread application:

       likwid-pin -c 0,2,4-6 ./myApp

       The  parent  process  is  pinned to processor 0 which is likely to be thread 0 in ./myApp.
       Thread 1 is pinned to processor 2, thread 2 to processor 4, thread 3 to  processor  5  and
       thread 4 to processor 6. If more threads are created than specified in the processor list,
       these threads are pinned to processor 0 as fallback.

       2.   For selection of CPUs inside of a CPUset  only  the  logical  numbering  is  allowed.
            Assuming CPUset 0,4,1,5:

       likwid-pin -c L:1,3 ./myApp

       This command pins ./myApp on CPU 4 and the thread started by ./myApp on CPU 5

       3.   A common use-case for the numbering by expression is pinning of an application on the
            Intel Xeon Phi coprocessor with its 60 cores each having 4 SMT threads.

       likwid-pin -c E:N:60:1:4 ./myApp

       This command schedules one thread per physical CPU core for ./myApp.

IMPORTANT NOTICE

       The detection of shepard threads works for Intel's/LLVM OpenMP runtime (>=12.0), for GCC's
       OpenMP  runtime  as  well  as  for  PGI's  OpenMP  runtime. If you encounter problems with
       pinning, please set a proper skip mask to skip the not-detected  shepard  threads.   Intel
       OpenMP runtime 11.0/11.1 requires to set a skip mask of 0x1.

AUTHOR

       Written by Thomas Gruber <thomas.roehl@googlemail.com>.

BUGS

       Report Bugs on <https://github.com/RRZE-HPC/likwid/issues>.

SEE ALSO

       taskset(1), likwid-perfctr(1), likwid-features(1), likwid-topology(1),