Provided by: likwid_5.2.1+dfsg1-3_amd64 bug

NAME

       likwid-pin - pin a sequential or threaded application to dedicated processors

SYNOPSIS

       likwid-pin [-vhSpqim] [-V <verbosity>] [-c/-C <corelist>] [-s <skip_mask>] [-d <delim>]

DESCRIPTION

       likwid-pin  is a command line application to pin a sequential or multithreaded application
       to dedicated processors. It can be used as replacement for taskset.  Opposite  to  taskset
       no  affinity  mask  but  single  processors are specified.  For multithreaded applications
       based on the pthread  library  the  pthread_create  library  call  is  overloaded  through
       LD_PRELOAD  and  each  created  thread  is pinned to a dedicated processor as specified in
       core_list .

       Per default every generated thread is pinned  to  the  core  in  the  order  of  calls  to
       pthread_create it is possible to skip single threads.

       The  OpenMP  implementations  of  GCC and ICC compilers are explicitly supported.  Clang's
       OpenMP backend should also work as it is built on top of Intel's OpenMP  runtime  library.
       Others  may  also work likwid-pin sets the environment variable OMP_NUM_THREADS for you if
       not already present.  It will set as many threads as present in  the  pin  expression.  Be
       aware  that  with pthreads the parent thread is always pinned. If you create for example 4
       threads with pthread_create and do not use the parent process as worker you still have  to
       provide num_threads+1 processor ids.

       likwid-pin  supports  different  numberings  for  pinning.  See section CPU EXPRESSION for
       details.

       For applications where first touch policy on NUMA systems cannot  be  employed  likwid-pin
       can  be  used  to turn on interleave memory placement. This can significantly speed up the
       performance of memory bound multithreaded codes. All NUMA nodes the user pinned threads to
       are used for interleaving.

OPTIONS

       -h,--help
              prints a help message to standard output, then exits.

       -v,--version
              prints version information to standard output, then exits.

       -V, --verbose <level>
              verbose   output  during  execution  for  debugging.  0  for  only  errors,  1  for
              informational output, 2 for detailed output and 3 for developer output

       -c,-C <cpu expression>
              specify a numerical list of processors.  The  list  may  contain  multiple   items,
              separated  by  comma, and ranges. For example 0,3,9-11. Other format are available,
              see the CPU EXPRESSION section.

       -s, --skip <skip_mask>
              Specify skip mask as HEX number. For each  set  bit  the  corresponding  thread  is
              skipped.

       -S,--sweep
              All  ccNUMA  memory  domains belonging to the specified thread list will be cleaned
              before the run. Can solve file buffer cache problems on Linux.

       -p     prints the available thread domains for logical pinning

       -i     set NUMA memory policy to interleave involving all NUMA nodes involved in pinning

       -m     set NUMA memory policy to membind involving all NUMA nodes involved in pinning

       -d <delim>
              usable with -p to specify the CPU delimiter in the cpulist

       -q,--quiet
              silent execution without output

CPU EXPRESSION

       1.  The most intuitive CPU selection method is a comma-separated list of  hardware  thread
           IDs.  An example for this is 0,2 which schedules the threads on hardware threads 0 and
           2.  The physical numbering also allows the usage of ranges like 0-2 which  results  in
           the list 0,1,2.

       2.  The  CPUs  can be selected by their indices inside of an affinity domain. The affinity
           domain is optional and if not given, Likwid assumes the domain 'N' for the whole node.
           The  format  is  L:<indexlist>  for  selecting  the  CPUs  inside  of  domain  'N'  or
           L:<domain>:<indexlist> for selecting the CPUs inside the  given  domain.  Assuming  an
           virtual  affinity domain 'P' that contains the CPUs 0,4,1,5,2,6,3,7.  After sorting it
           to have  physical  hardware  threads  first  we  get:  0,1,2,3,4,5,6,7.   The  logical
           numbering  L:P:0-2  results  in the selection 0,1,2 from the physical hardware threads
           first list.

       3.  The expression syntax enables the selection according to an  selection  function  with
           variable  input parameters. The format is either E:<affinity domain>:<numberOfThreads>
           to use the first <numberOfThreads> threads in affinity  domain  <affinity  domain>  or
           E:<affinity  domain>:<numberOfThreads>:<chunksize>:<stride>  to  use <numberOfThreads>
           threads with <chunksize> threads selected in row while skipping  <stride>  threads  in
           affinity domain <affinity domain>. Examples are E:N:4:1:2 for selecting the first four
           physical CPUs on a system with 2  hardware  thread  per  CPU  core  or  E:P:4:2:4  for
           choosing  the first two threads in affinity domain P, skipping 2 threads and selecting
           again two threads. The resulting CPU list for virtual affinity domain P is 0,4,2,6

       3.  The last format schedules the threads  not  only  in  a  single  affinity  domain  but
           distributed  them  evenly  over  all  available  affinity domains of the same kind. In
           contrast to the other formats, the selection  is  done  using  the  physical  hardware
           threads  first  and then the virtual hardware threads (aka SMT threads). The format is
           <affinity domain without number>:scatter like M:scatter to schedule the threads evenly
           in all available memory affinity domains. Assuming the two socket domains S0 = 0,4,1,5
           and S1 = 2,6,3,7 the expression S:scatter results in the CPU list 0,2,1,3,4,6,5,7

EXAMPLE

       1.   For standard pthread application:

       likwid-pin -c 0,2,4-6 ./myApp

       The parent process is pinned to processor 0 which is likely to be  thread  0  in  ./myApp.
       Thread  1  is  pinned to processor 2, thread 2 to processor 4, thread 3 to processor 5 and
       thread 4 to processor 6. If more threads are created than specified in the processor list,
       these threads are pinned to processor 0 as fallback.

       2.   For  selection  of  CPUs  inside  of  a CPUset only the logical numbering is allowed.
            Assuming CPUset 0,4,1,5:

       likwid-pin -c L:1,3 ./myApp

       This command pins ./myApp on CPU 4 and the thread started by ./myApp on CPU 5

       3.   A common use-case for the numbering by expression is pinning of an application on the
            Intel Xeon Phi coprocessor with its 60 cores each having 4 SMT threads.

       likwid-pin -c E:N:60:1:4 ./myApp

       This command schedules one applicationn thread per physical CPU core for ./myApp.

IMPORTANT NOTICE

       The detection of shepard threads works for Intel's/LLVM OpenMP runtime (>=12.0), for GCC's
       OpenMP runtime as well as for  PGI's  OpenMP  runtime.  If  you  encounter  problems  with
       pinning,  please  set  a proper skip mask to skip the not-detected shepard threads.  Intel
       OpenMP runtime 11.0/11.1 requires to set a skip mask of 0x1.

AUTHOR

       Written by Thomas Gruber <thomas.roehl@googlemail.com>.

BUGS

       Report Bugs on <https://github.com/RRZE-HPC/likwid/issues>.

SEE ALSO

       taskset(1), likwid-perfctr(1), likwid-features(1), likwid-topology(1),