plucky (1) hwloc-calc.1.gz

Provided by: hwloc-nox_2.12.0-1_amd64 bug

NAME

       hwloc-calc - Operate on cpu mask strings and objects

SYNOPSIS

       hwloc-calc [topology options] [options] <location1> [<location2> [...] ]

       Note  that  hwloc(7) provides a detailed explanation of the hwloc system and of valid <location> formats;
       it should be read before reading this man page.

TOPOLOGY OPTIONS

       All topology options must be given before all other options.

       --no-smt, --no-smt=<N>
                 Only keep the first PU per core in the input locations.  If <N> is specified, keep  the  <N>-th
                 instead, if any.  PUs are ordered by physical index during this filtering.

                 Note  that  this option is applied after searching locations.  Hence --no-smt pu:2-5 will first
                 select the PUs #2 to #5 in the machine before keeping one of them per core.  To rather get  PUs
                 #2 to #5 after filtering one per core, you should combine invocations:

                   hwloc-calc --restrict $(hwloc-calc --no-smt all) pu:2-5

       --cpukind <n>, --cpukind <infoname>=<infovalue>
                 Only  keep PUs whose CPU kind match.  Either a single CPU kind is specified as an index, or the
                 info attribute name-value will select matching kinds.

                 When specified by index, it corresponds to hwloc ranking of CPU  kinds  which  returns  energy-
                 efficient  cores  first,  and  high-performance  power-hungry cores last.  The full list of CPU
                 kinds may be seen with lstopo --cpukinds.

                 Note that this option is applied after searching locations.   Hence  --cpukind  0  core:1  will
                 return the second core of the machine if it is of kind 0, and nothing otherwise.  To rather get
                 the second core among those of kind 0, you should combine invocations:

                   hwloc-calc --restrict $(hwloc-calc --cpukind 0 all) core:1

       --default-nodes
                 Only keep NUMA nodes that are considered default nodes on heterogeneous memory platforms.  This
                 usually  includes  DRAM  memory nodes (or nodes of the same memory tier) rather than nodes with
                 specific characteristics (HBM, NVM, CXL, etc).

                 This option is useful for splitting the topology by NUMA  domain  when  binding  one  task  per
                 domain  even  if  some  NUMA domains have the same locality (e.g. one DRAM and one HBM node per
                 socket).

                 See hwloc_topology_get_default_nodeset() for details.

       --restrict <cpuset>
                 Restrict the topology to the given cpuset.  This removes  some  PUs  and  their  now-child-less
                 parents.

                 This is useful when combining invocations to filter some objects before selecting among them.

                 Beware  that  restricting the PUs in a topology may change the logical indexes of many objects,
                 including NUMA nodes.

       --restrict nodeset=<nodeset>
                 Restrict the topology  to  the  given  nodeset  (unless  --restrict-flags  specifies  something
                 different).  This removes some NUMA nodes and their now-child-less parents.

                 Beware  that  restricting  the  NUMA nodes in a topology may change the logical indexes of many
                 objects, including PUs.

       --restrict-flags <flags>
                 Enforce flags when restricting the topology.  Flags may be given as  numeric  values  or  as  a
                 comma-separated  list  of flag names that are passed to hwloc_topology_restrict().  Those names
                 may be substrings of actual  flag  names  as  long  as  a  single  one  matches,  for  instance
                 bynodeset,memless.  The default is 0 (or none).

       --disallowed
                 Include objects disallowed by administrative limitations.

       -i <path>, --input <path>
                 Read the topology from <path> instead of discovering the topology of the local machine.

                 If  <path>  is a file, it may be a XML file exported by a previous hwloc program.  If <path> is
                 "-", the standard input may be used as a XML file.

                 On Linux, <path> may be a directory containing the topology files gathered from another machine
                 topology with hwloc-gather-topology.

                 On x86, <path> may be a directory containing a cpuid dump gathered with hwloc-gather-cpuid.

                 When  the archivemount program is available, <path> may also be a tarball containing such Linux
                 or x86 topology files.

       -i <specification>, --input <specification>
                 Simulate a fake hierarchy (instead of discovering  the  topology  on  the  local  machine).  If
                 <specification>  is  "node:2  pu:3", the topology will contain two NUMA nodes with 3 processing
                 units in each of them.  The <specification> string must end with a number of PUs.

       --if <format>, --input-format <format>
                 Enforce the input in the given format, among xml, fsroot, cpuid and synthetic.

OUTPUT CONVERSION OPTIONS

       By default, the output is a CPU set (or nodeset).  These options convert this  set  into  objects,  count
       them, etc.

       All these options must be given after all topology options above.

       -N --number-of <type|depth>
              Report  the  number  of  objects  of  the given type or depth that intersect the CPU set.  This is
              convenient for finding how many cores, NUMA nodes or PUs are available in a machine.

              <type may contain a filter to select specific objects among the type. For instance -N  "numa[hbm]"
              counts NUMA nodes marked with subtype "HBM", while -N "numa[mcdram]" only counts MCDRAM NUMA nodes
              on KNL.

              If an OS device subtype such as gpu  is given instead of  osdev,  only  the  os  devices  of  that
              subtype will be counted.

              Special  values  such  as  cpukind and memorytier may be given to return the number of cpukinds or
              memory tiers matching the input location.

       -I --intersect <type|depth>
              Find the list of objects of the given type or depth that intersect the  CPU  set  and  report  the
              comma-separated  list  of  their  indexes  instead  of  the cpu mask string.  This may be used for
              determining the list of objects above or below the input objects.

              When combined with --physical, the list is convenient to pass to external tools such as taskset or
              numactl  --physcpubind  or  --membind.  This is different from --largest since the latter requires
              that all reported objects are strictly included inside the input objects.

              <type may contain a filter to select specific objects among the type. For instance -N  "numa[hbm]"
              lists  NUMA  nodes marked with subtype "HBM", while -N "numa[mcdram]" only lists MCDRAM NUMA nodes
              on KNL.  Note that this filter applies when selecting objects, but not when outputting them,  e.g.
              MCDRAM NUMA node #3 is outputted as 7 (NUMA node #7) instead of 3.

              If an OS device subtype such as gpu is given instead of osdev, only the os devices of that subtype
              will be returned.

              Special values such as cpukind and memorytier may be given to return the list of cpukind or memory
              tier indexes matching the input location.

              If  combined  with --object-output, object indexes are prefixed with types (e.g. Core:0 instead of
              0).

       -H --hierarchical <type1>.<type2>...
              Find the list of objects of type <type2> that intersect the CPU set and report the space-separated
              list  of  their  hierarchical  indexes  with  respect  to <type1>, <type2>, etc.  For instance, if
              package.core is given, the output would be Package:1.Core:2 Package:2.Core:3 if the input contains
              the third core of the second package and the fourth core of the third package.

              Only normal CPU-side object types should be used.

              NUMA  nodes  may  be  used  but  they  may  cause redundancy in the output on heterogeneous memory
              platform. For instance, on a platform with both DRAM and HBM memory on a package, the  first  core
              will  be  considered both as first core of first NUMA node (DRAM) and as first core of second NUMA
              node (HBM).

       --largest
              Report (in a human readable format) the list of largest objects which exactly  include  all  input
              objects  (by  looking  at their CPU sets).  None of these output objects intersect each other, and
              the sum of them is exactly equivalent to the input. No larger object is included in the input.

              This is different from --intersect where reported objects may not  be  strictly  included  in  the
              input.

       --local-memory
              Report the list of NUMA nodes that are local to the input objects.

              This  option  is  similar  to  -I  numa but the way nodes are selected is different: The selection
              performed by --local-memory may be precisely configured with --local-memory-flags, while  -I  numa
              just selects all nodes that are somehow local to any of the input objects.

              If  combined with --object-output, object indexes are prefixed with types (e.g. NUMANode:0 instead
              of 0).

       --local-memory-flags
              Change the flags used to select local NUMA nodes.  Flags may be given as numeric values  or  as  a
              comma-separated  list  of  flag  names  that are passed to hwloc_get_local_numanode_objs().  Those
              names may be substrings of actual flag names as long as a single one matches.  The  default  is  3
              (or  smaller,larger)  which means NUMA nodes are displayed if their locality either contains or is
              contained in the locality of the given object.

              This option enables --local-memory.

       --best-memattr <name>
              Enable the listing of local memory nodes with --local-memory, but only  display  the  local  nodes
              that have the best value for the memory attribute given by <name> (or as an index).

              If  the  memory attribute values depend on the initiator, the hwloc-calc input objects are used as
              the initiator.

              Standard attribute names are Capacity, Locality, Bandwidth, and Latency.  All existing  attributes
              in the current topology may be listed with

                  $ lstopo --memattrs

              If  combined  with  --object-output,  the  object index is prefixed with its type (e.g. NUMANode:0
              instead of 0).

              <name> may be suffixed  with  flags  to  tune  the  selection  of  best  nodes,  for  instance  as
              bandwidth,strict,default.   default  means  that  all local nodes are reported if no best could be
              found.  strict means that nodes are selected only if their performance is the  best  for  all  the
              input  CPUs.  On  a  dual-socket machine with HBM in each socket, both HBMs are the best for their
              local socket, but not for the remote socket.  Hence both HBM are  also  considered  best  for  the
              entire machine by default, but none if strict.

INPUT / OUTPUT SET AND OBJECT OPTIONS

       These options configure how objects and CPU/node sets are parsed on input and formatted on output.

       All these options must be given after all topology options above.

       -p --physical
                 Use OS/physical indexes instead of logical indexes for both input and output.

       -l --logical
                 Use logical indexes instead of physical/OS indexes for both input and output (default).

       --pi --physical-input
                 Use OS/physical indexes instead of logical indexes for input.

       --li --logical-input
                 Use logical indexes instead of physical/OS indexes for input (default).

       --po --physical-output
                 Use OS/physical indexes instead of logical indexes for output.

       --lo --logical-output
                 Use  logical  indexes  instead  of  physical/OS indexes for output (default, except for cpusets
                 which are always physical).

       -n --nodeset
                 Interpret both input and output sets as nodesets instead of CPU sets.  See --nodeset-output and
                 --nodeset-input below for details.

       --no --nodeset-output
                 Report  nodesets  instead  of  CPU  sets.  This output is more precise than the default CPU set
                 output when memory locality matters because it properly describes CPU-less NUMA nodes, as  well
                 as NUMA-nodes that are local to multiple CPUs.

       --ni --nodeset-input
                 Interpret input sets as nodesets instead of CPU sets.

FORMATTING OPTIONS

       All these options must be given after all topology options above.

       --oo --object-output
              When reporting object indexes (e.g. with -I or --local-memory), this option prefixes these indexes
              with types (e.g. Core:0 instead of 0).

       --sep <sep>
              Change the field separator in the output.  By default, a space is used to separate output  objects
              (for instance when --hierarchical or --largest is given) while a comma is used to separate indexes
              (for instance when --intersect is given).

       --single
              Singlify the output to a single CPU.

       --cpuset-output-format <hwloc|list|taskset|systemd-dbus-api> --cof <hwloc|list|taskset|systemd-dbus-api>
              Change the format of displayed bitmap strings (CPU  set  or  nodeset).   By  default,  the  hwloc-
              specific  format is used.  If list is given, the output is a comma-separated of numbers or ranges,
              e.g. 2,4-5,8 .  If taskset is given, the output is compatible with the taskset  program  (replaces
              the  former  --taskset  option).   If  systemd-dbus-api  is  given,  the output is compatible with
              systemd's D-Bus API, e.g. "ay 0x0002 0x78 0x04" for the CPU set list "3-6,10".

              For  convenience,  --nodeset-output-format  (or  --nof)  behaves  the  same   but   also   implies
              --nodeset-output.

              This option has no impact on the format of input CPU set strings, see --cpuset-input-format.

       --cpuset-input-format <hwloc|list|taskset> --cif <hwloc|list|taskset>
              Change  the  format  of  input bitmap strings (CPU set or nodeset).  By default, the tool tries to
              guess the type automatically between hwloc, list or  taskset  formats.   This  option  forces  the
              parsing  format  to  avoid  ambiguity  for  instance  when "1,3,5" may be parsed as a hwloc cpuset
              "0x1,0x00000003,0x00000005" or as list "1-1,3-3,5-5".

              This option has no impact on the format of output CPU set strings, see --cpuset-output-format.

       -q --quiet
              Hide non-fatal error messages.  It mostly includes locations pointing to non-existing objects.

       -v --verbose
              Verbose output.

       --version
              Report version and exit.

       -h --help
              Display help message and exit.

DESCRIPTION

       hwloc-calc generates and manipulates CPU mask strings or objects.  Both input and output  may  be  either
       objects  (with  physical  or  logical indexes), CPU lists (with physical or logical indexes), or CPU mask
       strings (always physically indexed).  Input location specification is described in hwloc(7).

       If objects or CPU mask strings are given on the command-line, they are combined and a  single  output  is
       printed.   If  no  object  or  CPU  mask strings are given on the command-line, the program will read the
       standard input.  It will combine multiple objects or CPU mask strings that are given on the same line  of
       the standard input line with spaces as separators.  Different input lines will be processed separately.

       Command-line  arguments  and options are processed in order.  First topology configuration options should
       be given.  Then, for instance, changing the type of  input  indexes  with  --li  or  changing  the  input
       topology with -i only affects the processing the following arguments.

       NOTE:  It  is  highly  recommended that you read the hwloc(7) overview page before reading this man page.
       Most of the concepts described in hwloc(7) directly apply to the hwloc-calc utility.

EXAMPLES

       hwloc-calc's operation is best described through several examples.

       To display the (physical) CPU mask corresponding to the second package:

           $ hwloc-calc package:1
           0x000000f0

       To display the (physical) CPU mask corresponding to  the  third  pacakge,  excluding  its  even  numbered
       logical processors:

           $ hwloc-calc package:2 ~PU:even
           0x00000c00

       To display the (physical) CPU mask of the entire topology except the third package:

           $ hwloc-calc all ~package:3
           0x0000f0ff

       To combine two (physical) CPU masks:

           $ hwloc-calc 0x0000ffff 0xff000000
           0xff00ffff

Examples of listing or counting objects

       To display the list of logical numbers of processors included in the second package:

           $ hwloc-calc --intersect PU package:1
           4,5,6,7

       To  bind  GNU  OpenMP  threads  logically  over  the whole machine, we need to use physical number output
       instead:

           $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU all`
           $ echo $GOMP_CPU_AFFINITY
           0,4,1,5,2,6,3,7

       To display the list of NUMA nodes, by physical indexes, that intersect a given (physical) CPU mask:

           $ hwloc-calc --physical --intersect NUMAnode 0xf0f0f0f0
           0,2

       To find how many cores are in the second CPU kind (those cores are  likely  higher-performance  and  more
       power-hungry than cores of the first kind):

           $ hwloc-calc --cpukind 1 -N core all
           4

       To  convert a cpu mask to human-readable output, the -H option can be used to emit a space-delimited list
       of locations:

           $ echo 0x000000f0 | hwloc-calc -q -H package.core
           Package:1.Core1 Package:1.Core:1 Package:1.Core:2 Package:1.Core:3

       To use some other character (e.g., a comma) instead of spaces in output, use the --sep option:

           $ echo 0x000000f0 | hwloc-calc -q -H package.core --sep ,
           Package:1.Core1,Package:1.Core:1,Package:1.Core:2,Package:1.Core:3

       To synthetize a set of cores into largest objects on a 2-node 2-package 2-core machine:

           $ hwloc-calc core:0 --largest
           Core:0
           $ hwloc-calc core:0-1 --largest
           Package:0
           $ hwloc-calc core:4-7 --largest
           L3Cache:1
           $ hwloc-calc core:2-6 --largest
           Package:1 Package:2 Core:6
           $ hwloc-calc pack:2 --largest
           Package:2
           $ hwloc-calc package:2-3 --largest
           L3Cache:1

       To get the set of first threads of all cores:

           $ hwloc-calc core:all.pu:0
           0xffff0000
           $ hwloc-calc --no-smt all -I pu
           0,2,4,6,8,10,12,14

       To get the number of cpukinds inside a package:

           $ hwloc-calc -N cpukind package:0
           2

Examples of listing or counting NUMA nodes

       To display the list of NUMA nodes, by physical indexes, whose locality is exactly equal to a Package:

           $ hwloc-calc --local-memory-flags 0 --physical-output pack:1
           4,7

       To display the list of default NUMA nodes, by logical indexes, in the entire machine:

           $ hwloc-calc --default-nodes -I numa all
           0,2,4,6

       To display the best-capacity NUMA node(s), by physical indexes, whose locality  is  exactly  equal  to  a
       Package:

           $ hwloc-calc --local-memory-flags 0 --best-memattr capacity --physical-output pack:1
           4

       To find the number of NUMA nodes with subtype "HBM":

           $ hwloc-calc -N "numa[hbm]" all
           4

       To find the number of NUMA nodes in memory tier 1 (DRAM nodes on a server with HBM and DRAM):

           $ hwloc-calc -N "numa[tier=1]" all
           4

       To find the NUMA node of subtype MCDRAM (on KNL) near a PU:

           $ hwloc-calc -I "numa[mcdram]" --oo pu:157
           NUMANode:1

       To find the memory tier of a NUMA node:

           $ hwloc-calc -I memorytier node:2
           1

Examples with physical and logical indexes

       Converting object logical indexes (default) from/to physical/OS indexes may be performed with --intersect
       combined with either --physical-output (logical to physical conversion) or --physical-input (physical  to
       logical):

           $ hwloc-calc --physical-output PU:2 --intersect PU
           3
           $ hwloc-calc --physical-input PU:3 --intersect PU
           2

       This may also be used for converting indexes of memory objects, even with heterogeneous memory:

           $ hwloc-calc --physical-output node:2 --intersect node
           3
           $ hwloc-calc --physical-input node:3 --intersect node
           2

       To combine both physical and logical indexes as input:

           $ hwloc-calc PU:2 --physical-input PU:3
           0x0000000c

Examples with I/O devices

       To display the set of CPUs near network interface eth0:

           $ hwloc-calc os=eth0
           0x00005555

       To display the indexes of packages near PCI device whose bus ID is 0000:01:02.0:

           $ hwloc-calc pci=0000:01:02.0 --intersect Package
           1

       OS  devices  may also be filtered by subtype. In this example, there are 8 OS devices in the system, 4 of
       them are near NUMA node #1, and only 2 of these are CoProcessors:

           $ utils/hwloc/hwloc-calc -I osdev all
           0,1,2,3,4,5,6,7,8
           $ utils/hwloc/hwloc-calc -I osdev node:1
           5,6,7,8
           $ utils/hwloc/hwloc-calc -I coproc node:1
           7,8

Examples with other tools

       To make GNU OpenMP use exactly one thread per core, and in logical core order:

           $ export OMP_NUM_THREADS=`hwloc-calc --number-of core all`
           $ echo $OMP_NUM_THREADS
           4
           $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU --no-smt all`
           $ echo $GOMP_CPU_AFFINITY
           0,2,1,3

       To export bitmask in a format that is acceptable by the resctrl Linux subsystem  (for  configuring  cache
       partitioning, etc), apply a sed regexp to the output of hwloc-calc:

           $ hwloc-calc pack:all.core:7-9.pu:0
           0x00000380,,0x00000380   <this format cannot be given to resctrl>
           $ hwloc-calc pack:all.core:7-9.pu:0 | sed -e 's/0x//g' -e 's/,,/,0,/g' -e 's/,,/,0,/g'
           00000380,0,00000380
           # echo 00000380,0,00000380 > /sys/fs/resctrl/test/cpus
           # cat /sys/fs/resctrl/test/cpus
           00000000,00000380,00000000,00000380   <the modified bitmask was corrected parsed by resctrl>

Example of use of the systemd-dbus-api cpuset and nodeset outputs format

       hwloc-calc  allows one to generate the very cryptic AllowedCPUs and AllowedMemoryNodes strings, which the
       D-Bus API of systemd expects, from other hwloc  representations.  This  is  especially  useful  when  the
       systemd-run command, which understands numeric lists, cannot be used.

       First, create a systemd slice:

           $ busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager StartUnit ss my_slice.slice fail

       Then, configure the CPU and Node sets of the slice, using hwloc-calc to translate the syntax:

           $ busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager SetUnitProperties 'sba(sv)' my_slice.slice 1 1 AllowedCPUs $(hwloc-calc pu:0 pu:31 pu:32 pu:63 pu:64 pu:77 --cpuset-output-format systemd-dbus-api)
           $ busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager SetUnitProperties 'sba(sv)' my_slice.slice 1 1 AllowedMemoryNodes $(hwloc-calc pu:0 pu:31 pu:32 pu:63 pu:64 pu:77 --nodeset-output-format systemd-dbus-api)

       Finally, add the current process to the slice:

           $ busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager StartTransientUnit 'ssa(sv)a(sa(sv))' my_scope.scope fail 3 Delegate b 1 PIDs au 1 $$ Slice s my_slice.slice 0

       More info in the org.freedesktop.systemd1(5) manual page.

RETURN VALUE

       Upon  successful  execution,  hwloc-calc  displays  the (physical) CPU mask string, (physical or logical)
       object list, or (physical or logical) object number list.  The return value is 0.

       hwloc-calc will return nonzero if any kind of error occurs, such as (but  not  limited  to):  failure  to
       parse the command line.

SEE ALSO

       hwloc(7), lstopo(1), hwloc-info(1)