Provided by: hwloc-nox_2.11.2-1build1_amd64 bug

NAME

       hwloc-calc - Operate on cpu mask strings and objects

SYNOPSIS

       hwloc-calc [topology options] [options] <location1> [<location2> [...] ]

       Note  that  hwloc(7)  provides  a  detailed  explanation  of the hwloc system and of valid
       <location> formats; it should be read before reading this man page.

TOPOLOGY OPTIONS

       All topology options must be given before all other options.

       --no-smt, --no-smt=<N>
                 Only keep the first PU per core in the input locations.  If  <N>  is  specified,
                 keep  the <N>-th instead, if any.  PUs are ordered by physical index during this
                 filtering.

                 Note that this option is applied  after  searching  locations.   Hence  --no-smt
                 pu:2-5  will  first select the PUs #2 to #5 in the machine before keeping one of
                 them per core.  To rather get PUs #2 to #5 after filtering  one  per  core,  you
                 should combine invocations:

                   hwloc-calc --restrict $(hwloc-calc --no-smt all) pu:2-5

       --cpukind <n>, --cpukind <infoname>=<infovalue>
                 Only keep PUs whose CPU kind match.  Either a single CPU kind is specified as an
                 index, or the info attribute name-value will select matching kinds.

                 When specified by index, it corresponds to hwloc  ranking  of  CPU  kinds  which
                 returns  energy-efficient  cores  first, and high-performance power-hungry cores
                 last.  The full list of CPU kinds may be seen with lstopo --cpukinds.

                 Note that this option is applied after searching locations.  Hence  --cpukind  0
                 core:1  will  return  the  second  core  of  the machine if it is of kind 0, and
                 nothing otherwise.  To rather get the second core among those  of  kind  0,  you
                 should combine invocations:

                   hwloc-calc --restrict $(hwloc-calc --cpukind 0 all) core:1

       --restrict <cpuset>
                 Restrict the topology to the given cpuset.  This removes some PUs and their now-
                 child-less parents.

                 This is  useful  when  combining  invocations  to  filter  some  objects  before
                 selecting among them.

                 Beware  that restricting the PUs in a topology may change the logical indexes of
                 many objects, including NUMA nodes.

       --restrict nodeset=<nodeset>
                 Restrict the topology to the given nodeset  (unless  --restrict-flags  specifies
                 something  different).   This  removes  some NUMA nodes and their now-child-less
                 parents.

                 Beware that restricting the NUMA nodes in a  topology  may  change  the  logical
                 indexes of many objects, including PUs.

       --restrict-flags <flags>
                 Enforce  flags  when  restricting  the  topology.  Flags may be given as numeric
                 values  or  as  a  comma-separated  list  of  flag  names  that  are  passed  to
                 hwloc_topology_restrict().   Those  names may be substrings of actual flag names
                 as long as a single one matches, for instance bynodeset,memless.  The default is
                 0 (or none).

       --disallowed
                 Include objects disallowed by administrative limitations.

       -i <path>, --input <path>
                 Read  the  topology from <path> instead of discovering the topology of the local
                 machine.

                 If <path> is a file, it may be a XML file exported by a previous hwloc  program.
                 If <path> is "-", the standard input may be used as a XML file.

                 On  Linux, <path> may be a directory containing the topology files gathered from
                 another machine topology with hwloc-gather-topology.

                 On x86, <path> may be a directory containing a cpuid dump gathered  with  hwloc-
                 gather-cpuid.

                 When  the  archivemount  program  is  available,  <path>  may  also be a tarball
                 containing such Linux or x86 topology files.

       -i <specification>, --input <specification>
                 Simulate a fake hierarchy (instead of discovering  the  topology  on  the  local
                 machine).  If  <specification>  is  "node:2 pu:3", the topology will contain two
                 NUMA nodes with 3 processing units in each of them.  The <specification>  string
                 must end with a number of PUs.

       --if <format>, --input-format <format>
                 Enforce the input in the given format, among xml, fsroot, cpuid and synthetic.

OPTIONS

       All these options must be given after all topology options above.

       -p --physical
                 Use OS/physical indexes instead of logical indexes for both input and output.

       -l --logical
                 Use  logical  indexes  instead  of physical/OS indexes for both input and output
                 (default).

       --pi --physical-input
                 Use OS/physical indexes instead of logical indexes for input.

       --li --logical-input
                 Use logical indexes instead of physical/OS indexes for input (default).

       --po --physical-output
                 Use OS/physical indexes instead of logical indexes for output.

       --lo --logical-output
                 Use logical indexes instead of physical/OS indexes for output  (default,  except
                 for cpusets which are always physical).

       -n --nodeset
                 Interpret  both  input  and  output  sets  as nodesets instead of CPU sets.  See
                 --nodeset-output and --nodeset-input below for details.

       --no --nodeset-output
                 Report nodesets instead of CPU sets.  This  output  is  more  precise  than  the
                 default  CPU  set  output  when  memory  locality  matters  because  it properly
                 describes CPU-less NUMA nodes, as well as NUMA-nodes that are local to  multiple
                 CPUs.

       --ni --nodeset-input
                 Interpret input sets as nodesets instead of CPU sets.

       --oo --object-output
                 When  reporting  object  indexes  (e.g.  with -I or --local-memory), this option
                 prefixes these indexes with types (e.g. Core:0 instead of 0).

       -N --number-of <type|depth>
                 Report the number of objects of the given type or depth that intersect  the  CPU
                 set.   This  is  convenient  for  finding  how many cores, NUMA nodes or PUs are
                 available in a machine.

                 When combined with --nodeset or  --nodeset-output,  the  nodeset  is  considered
                 instead  of  the  CPU  set  for  finding  matching objects.  This is useful when
                 reporting the output as a number or set of NUMA nodes.

                 <type may contain a filter to  select  specific  objects  among  the  type.  For
                 instance  -N  "numa[hbm]"  counts NUMA nodes marked with subtype "HBM", while -N
                 "numa[mcdram]" only counts MCDRAM NUMA nodes on KNL.

                 If an OS device subtype such as gpu  is given instead  of  osdev,  only  the  os
                 devices of that subtype will be counted.

       -I --intersect <type|depth>
                 Find  the  list of objects of the given type or depth that intersect the CPU set
                 and report the comma-separated list of their indexes instead  of  the  cpu  mask
                 string.  This may be used for determining the list of objects above or below the
                 input objects.

                 When combined with --physical, the list is convenient to pass to external  tools
                 such  as  taskset or numactl --physcpubind or --membind.  This is different from
                 --largest since the latter requires  that  all  reported  objects  are  strictly
                 included inside the input objects.

                 When  combined  with  --nodeset  or  --nodeset-output, the nodeset is considered
                 instead of the CPU set for  finding  matching  objects.   This  is  useful  when
                 reporting the output as a number or set of NUMA nodes.

                 <type  may  contain  a  filter  to  select  specific objects among the type. For
                 instance -N "numa[hbm]" lists NUMA nodes marked with  subtype  "HBM",  while  -N
                 "numa[mcdram]" only lists MCDRAM NUMA nodes on KNL.

                 If  an  OS  device  subtype  such  as gpu is given instead of osdev, only the os
                 devices of that subtype will be returned.

                 If combined with --object-output, object indexes are prefixed with  types  (e.g.
                 Core:0 instead of 0).

       -H --hierarchical <type1>.<type2>...
                 Find  the  list of objects of type <type2> that intersect the CPU set and report
                 the space-separated list of their hierarchical indexes with respect to  <type1>,
                 <type2>,  etc.   For  instance,  if  package.core  is given, the output would be
                 Package:1.Core:2 Package:2.Core:3 if the input contains the third  core  of  the
                 second package and the fourth core of the third package.

                 Only normal CPU-side object types should be used.

                 NUMA  nodes  may  be  used  but  they  may  cause  redundancy  in  the output on
                 heterogeneous memory platform. For instance, on a platform with  both  DRAM  and
                 HBM memory on a package, the first core will be considered both as first core of
                 first NUMA node (DRAM) and as first core of second NUMA node (HBM).

       --largest Report (in a human readable format) the list of largest  objects  which  exactly
                 include  all input objects (by looking at their CPU sets).  None of these output
                 objects intersect each other, and the sum of them is exactly equivalent  to  the
                 input. No larger object is included in the input.

                 This  is  different  from --intersect where reported objects may not be strictly
                 included in the input.

       --local-memory
                 Report the list of NUMA nodes that are local to the input objects.

                 This option is similar to -I numa but the way nodes are selected  is  different:
                 The  selection  performed  by  --local-memory  may  be precisely configured with
                 --local-memory-flags, while -I numa just selects  all  nodes  that  are  somehow
                 local to any of the input objects.

                 If  combined  with --object-output, object indexes are prefixed with types (e.g.
                 NUMANode:0 instead of 0).

       --local-memory-flags
                 Change the flags used to select local NUMA nodes.  Flags may be given as numeric
                 values  or  as  a  comma-separated  list  of  flag  names  that  are  passed  to
                 hwloc_get_local_numanode_objs().  Those names may be substrings of  actual  flag
                 names  as  long  as  a single one matches.  The default is 3 (or smaller,larger)
                 which means NUMA nodes are displayed if their locality  either  contains  or  is
                 contained in the locality of the given object.

                 This option enables --local-memory.

       --best-memattr <name>
                 Enable  the  listing of local memory nodes with --local-memory, but only display
                 the local nodes that have the best value  for  the  memory  attribute  given  by
                 <name> (or as an index).

                 If  the  memory  attribute  values depend on the initiator, the hwloc-calc input
                 objects are used as the initiator.

                 Standard attribute names are Capacity, Locality, Bandwidth,  and  Latency.   All
                 existing attributes in the current topology may be listed with

                     $ lstopo --memattrs

                 If  combined  with  --object-output,  the object index is prefixed with its type
                 (e.g. NUMANode:0 instead of 0).

                 <name> may be suffixed with flags to tune  the  selection  of  best  nodes,  for
                 instance  as  bandwidth,strict,default.   default means that all local nodes are
                 reported if no best could be found.  strict means that nodes are  selected  only
                 if  their  performance  is  the  best  for  all the input CPUs. On a dual-socket
                 machine with HBM in each socket, both HBMs are the best for their local  socket,
                 but  not for the remote socket.  Hence both HBM are also considered best for the
                 entire machine by default, but none if strict.

       --sep <sep>
                 Change the field separator in the output.   By  default,  a  space  is  used  to
                 separate output objects (for instance when --hierarchical or --largest is given)
                 while a comma is used to separate indexes  (for  instance  when  --intersect  is
                 given).

       --single  Singlify the output to a single CPU.

       --cpuset-output-format             <hwloc|list|taskset|systemd-dbus-api>             --cof
       <hwloc|list|taskset|systemd-dbus-api>
                 Change the format of displayed CPU set strings.  By default, the  hwloc-specific
                 format is used.  If list is given, the output is a comma-separated of numbers or
                 ranges, e.g. 2,4-5,8 .  If taskset is given, the output is compatible  with  the
                 taskset  program (replaces the former --taskset option).  If systemd-dbus-api is
                 given, the output is compatible with systemd's D-Bus API, e.g.  "AllowedCPUs  ay
                 0x0002 0x78 0x04" for the CPU set list "3-6,10".

                 This  option  has  no  impact  on  the  format  of  input  CPU  set strings, see
                 --cpuset-input-format.

       --cpuset-input-format <hwloc|list|taskset> --cif <hwloc|list|taskset>
                 Change the format of input CPU set strings.  By default, the tool tries to guess
                 the  type  automatically  between  hwloc,  list or taskset formats.  This option
                 forces the parsing format to avoid ambiguity for instance when  "1,3,5"  may  be
                 parsed as a hwloc cpuset "0x1,0x00000003,0x00000005" or as list "1-1,3-3,5-5".

                 This  option  has  no  impact  on  the  format  of  output  CPU set strings, see
                 --cpuset-output-format.

       -q --quiet
                 Hide non-fatal error messages.  It mostly includes locations  pointing  to  non-
                 existing objects.

       -v --verbose
                 Verbose output.

       --version Report version and exit.

       -h --help Display help message and exit.

DESCRIPTION

       hwloc-calc  generates  and manipulates CPU mask strings or objects.  Both input and output
       may be either objects (with physical or logical indexes),  CPU  lists  (with  physical  or
       logical  indexes),  or  CPU  mask  strings  (always  physically  indexed).  Input location
       specification is described in hwloc(7).

       If objects or CPU mask strings are given on the command-line,  they  are  combined  and  a
       single output is printed.  If no object or CPU mask strings are given on the command-line,
       the program will read the standard input.  It will combine multiple objects  or  CPU  mask
       strings  that  are  given  on  the  same  line  of  the standard input line with spaces as
       separators.  Different input lines will be processed separately.

       Command-line arguments and options are processed in order.  First  topology  configuration
       options should be given.  Then, for instance, changing the type of input indexes with --li
       or changing the  input  topology  with  -i  only  affects  the  processing  the  following
       arguments.

       NOTE:  It  is  highly  recommended that you read the hwloc(7) overview page before reading
       this man page.  Most of the concepts described in hwloc(7) directly apply  to  the  hwloc-
       calc utility.

EXAMPLES

       hwloc-calc's operation is best described through several examples.

       To display the (physical) CPU mask corresponding to the second package:

           $ hwloc-calc package:1
           0x000000f0

       To  display the (physical) CPU mask corresponding to the third pacakge, excluding its even
       numbered logical processors:

           $ hwloc-calc package:2 ~PU:even
           0x00000c00

       To display the (physical) CPU mask of the entire topology except the third package:

           $ hwloc-calc all ~package:3
           0x0000f0ff

       To combine two (physical) CPU masks:

           $ hwloc-calc 0x0000ffff 0xff000000
           0xff00ffff

Examples of listing or counting objects

       To display the list of logical numbers of processors included in the second package:

           $ hwloc-calc --intersect PU package:1
           4,5,6,7

       To bind GNU OpenMP threads logically over the whole  machine,  we  need  to  use  physical
       number output instead:

           $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU all`
           $ echo $GOMP_CPU_AFFINITY
           0,4,1,5,2,6,3,7

       To  display the list of NUMA nodes, by physical indexes, that intersect a given (physical)
       CPU mask:

           $ hwloc-calc --physical --intersect NUMAnode 0xf0f0f0f0
           0,2

       To find how many cores are in  the  second  CPU  kind  (those  cores  are  likely  higher-
       performance and more power-hungry than cores of the first kind):

           $ hwloc-calc --cpukind 1 -N core all
           4

       To convert a cpu mask to human-readable output, the -H option can be used to emit a space-
       delimited list of locations:

           $ echo 0x000000f0 | hwloc-calc -q -H package.core
           Package:1.Core1 Package:1.Core:1 Package:1.Core:2 Package:1.Core:3

       To use some other character (e.g., a comma) instead of spaces in  output,  use  the  --sep
       option:

           $ echo 0x000000f0 | hwloc-calc -q -H package.core --sep ,
           Package:1.Core1,Package:1.Core:1,Package:1.Core:2,Package:1.Core:3

       To synthetize a set of cores into largest objects on a 2-node 2-package 2-core machine:

           $ hwloc-calc core:0 --largest
           Core:0
           $ hwloc-calc core:0-1 --largest
           Package:0
           $ hwloc-calc core:4-7 --largest
           L3Cache:1
           $ hwloc-calc core:2-6 --largest
           Package:1 Package:2 Core:6
           $ hwloc-calc pack:2 --largest
           Package:2
           $ hwloc-calc package:2-3 --largest
           L3Cache:1

       To get the set of first threads of all cores:

           $ hwloc-calc core:all.pu:0
           0xffff0000
           $ hwloc-calc --no-smt all -I pu
           0,2,4,6,8,10,12,14

Examples of listing or counting NUMA nodes

       To display the list of NUMA nodes, by physical indexes, whose locality is exactly equal to
       a Package:

           $ hwloc-calc --local-memory-flags 0 --physical-output pack:1
           4,7

       To display the best-capacity NUMA node(s), by physical indexes, whose locality is  exactly
       equal to a Package:

           $ hwloc-calc --local-memory-flags 0 --best-memattr capacity --physical-output pack:1
           4

       To find the number of NUMA nodes with subtype "HBM":

           $ hwloc-calc -N "numa[hbm]" all
           4

       To  find  the  number  of NUMA nodes in memory tier 1 (DRAM nodes on a server with HBM and
       DRAM):

           $ hwloc-calc -N "numa[tier=1]" all
           4

       To find the NUMA node of subtype MCDRAM (on KNL) near a PU:

           $ hwloc-calc -I "numa[mcdram]" pu:157
           1

Examples with physical and logical indexes

       Converting object logical indexes (default) from/to physical/OS indexes may  be  performed
       with  --intersect  combined with either --physical-output (logical to physical conversion)
       or --physical-input (physical to logical):

           $ hwloc-calc --physical-output PU:2 --intersect PU
           3
           $ hwloc-calc --physical-input PU:3 --intersect PU
           2

       One should add --nodeset when converting indexes of memory objects to make sure  a  single
       NUMA node index is returned on platforms with heterogeneous memory:

           $ hwloc-calc --nodeset --physical-output node:2 --intersect node
           3
           $ hwloc-calc --nodeset --physical-input node:3 --intersect node
           2

       To combine both physical and logical indexes as input:

           $ hwloc-calc PU:2 --physical-input PU:3
           0x0000000c

Examples with I/O devices

       To display the set of CPUs near network interface eth0:

           $ hwloc-calc os=eth0
           0x00005555

       To display the indexes of packages near PCI device whose bus ID is 0000:01:02.0:

           $ hwloc-calc pci=0000:01:02.0 --intersect Package
           1

       OS devices may also be filtered by subtype. In this example, there are 8 OS devices in the
       system, 4 of them are near NUMA node #1, and only 2 of these are CoProcessors:

           $ utils/hwloc/hwloc-calc -I osdev all
           0,1,2,3,4,5,6,7,8
           $ utils/hwloc/hwloc-calc -I osdev node:1
           5,6,7,8
           $ utils/hwloc/hwloc-calc -I coproc node:1
           7,8

Examples with other tools

       To make GNU OpenMP use exactly one thread per core, and in logical core order:

           $ export OMP_NUM_THREADS=`hwloc-calc --number-of core all`
           $ echo $OMP_NUM_THREADS
           4
           $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU --no-smt all`
           $ echo $GOMP_CPU_AFFINITY
           0,2,1,3

       To export bitmask in a format that is acceptable  by  the  resctrl  Linux  subsystem  (for
       configuring cache partitioning, etc), apply a sed regexp to the output of hwloc-calc:

           $ hwloc-calc pack:all.core:7-9.pu:0
           0x00000380,,0x00000380   <this format cannot be given to resctrl>
           $ hwloc-calc pack:all.core:7-9.pu:0 | sed -e 's/0x//g' -e 's/,,/,0,/g' -e 's/,,/,0,/g'
           00000380,0,00000380
           # echo 00000380,0,00000380 > /sys/fs/resctrl/test/cpus
           # cat /sys/fs/resctrl/test/cpus
           00000000,00000380,00000000,00000380    <the  modified  bitmask was corrected parsed by
       resctrl>

Example of use of the systemd-dbus-api cpuset output format

       hwloc-calc allows one to generate the very cryptic AllowedCPUs string, which the D-Bus API
       of  systemd  expects,  from  other  supported  CPU set representations. This is especially
       useful when the systemd-run command, which understands CPU set provided as list, cannot be
       used.

       First, create a systemd slice:

           $ busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager StartUnit ss my_slice.slice fail

       Then, configure the CPU set of the slice, using hwloc-calc to translate the syntax:

           $ busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager SetUnitProperties 'sba(sv)' my_slice.slice 1 1 $(hwloc-calc pu:0 pu:31 pu:32 pu:63 pu:64 pu:77 --cpuset-output-format systemd-dbus-api)

       Finally, add the current process to the slice:

           $ busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager StartTransientUnit 'ssa(sv)a(sa(sv))' my_scope.scope fail 3 Delegate b 1 PIDs au 1 $$ Slice s my_slice.slice 0

       More info in the org.freedesktop.systemd1(5) manual page.

RETURN VALUE

       Upon  successful  execution, hwloc-calc displays the (physical) CPU mask string, (physical
       or logical) object list, or (physical or logical) object number list.  The return value is
       0.

       hwloc-calc  will return nonzero if any kind of error occurs, such as (but not limited to):
       failure to parse the command line.

SEE ALSO

       hwloc(7), lstopo(1), hwloc-info(1)