Ubuntu Manpage: hwloc-calc - Operate on cpu mask strings and objects

NAME

       hwloc-calc - Operate on cpu mask strings and objects

SYNOPSIS

       hwloc-calc [topology options] [options] <location1> [<location2> [...] ]

       Note  that  hwloc(7)  provides  a  detailed  explanation  of the hwloc system and of valid
       <location> formats; it should be read before reading this man page.

TOPOLOGY OPTIONS

       All topology options must be given before all other options.

       --no-smt, --no-smt=<N>
                 Only keep the first PU per core in the input locations.  If  <N>  is  specified,
                 keep  the <N>-th instead, if any.  PUs are ordered by physical index during this
                 filtering.

       --cpukind <n>
                 --cpukind <infoname>=<infovalue> Only keep PUs whose CPU kind match.   Either  a
                 single  CPU  kind  is specified as an index, or the info name/value keypair will
                 select matching kinds.

                 When specified by index, it corresponds to hwloc  ranking  of  CPU  kinds  which
                 returns  energy-efficient  cores  first, and high-performance power-hungry cores
                 last.  The full list of CPU kinds may be seen with lstopo --cpukinds.

       --restrict <cpuset>
                 Restrict the topology to the given cpuset.

       --restrict nodeset=<nodeset>
                 Restrict the topology to the given nodeset,  unless  --restrict-flags  specifies
                 something different.

       --restrict-flags <flags>
                 Enforce  flags  when  restricting  the  topology.  Flags may be given as numeric
                 values  or  as  a  comma-separated  list  of  flag  names  that  are  passed  to
                 hwloc_topology_restrict().   Those  names may be substrings of actual flag names
                 as long as a single one matches, for instance bynodeset,memless.  The default is
                 0 (or none).

       --disallowed
                 Include objects disallowed by administrative limitations.

       -i <path>, --input <path>
                 Read  the  topology from <path> instead of discovering the topology of the local
                 machine.

                 If <path> is a file and XML support has been compiled in hwloc, it may be a  XML
                 file exported by a previous hwloc program.  If <path> is "-", the standard input
                 may be used as a XML file.

                 On Linux, <path> may be a directory containing the topology files gathered  from
                 another machine topology with hwloc-gather-topology.

                 On  x86,  <path> may be a directory containing a cpuid dump gathered with hwloc-
                 gather-cpuid.

                 When the archivemount program  is  available,  <path>  may  also  be  a  tarball
                 containing such Linux or x86 topology files.

       -i <specification>, --input <specification>
                 Simulate  a  fake  hierarchy  (instead  of discovering the topology on the local
                 machine). If <specification> is "node:2 pu:3", the  topology  will  contain  two
                 NUMA  nodes with 3 processing units in each of them.  The <specification> string
                 must end with a number of PUs.

       --if <format>, --input-format <format>
                 Enforce the input in the given format, among xml, fsroot, cpuid and synthetic.

OPTIONS

All these options must be given after all topology options above.

-p --physical
Use OS/physical indexes instead of logical indexes for both input and output.

-l --logical
Use logical indexes instead of physical/OS indexes for both input and output
(default).

--pi --physical-input
Use OS/physical indexes instead of logical indexes for input.

--li --logical-input
Use logical indexes instead of physical/OS indexes for input (default).

--po --physical-output
Use OS/physical indexes instead of logical indexes for output.

--lo --logical-output
Use logical indexes instead of physical/OS indexes for output (default, except
for cpusets which are always physical).

-n --nodeset
Interpret both input and output sets as nodesets instead of CPU sets. See
--nodeset-output and --nodeset-input below for details.

--no --nodeset-output
Report nodesets instead of CPU sets. This output is more precise than the
default CPU set output when memory locality matters because it properly
describes CPU-less NUMA nodes, as well as NUMA-nodes that are local to multiple
CPUs.

--ni --nodeset-input
Interpret input sets as nodesets instead of CPU sets.

-N --number-of <type|depth>
Report the number of objects of the given type or depth that intersect the CPU
set. This is convenient for finding how many cores, NUMA nodes or PUs are
available in a machine.

When combined with --nodeset or --nodeset-output, the nodeset is considered
instead of the CPU set for finding matching objects. This is useful when
reporting the output as a number or set of NUMA nodes.

If an OS device subtype such as gpu is given instead of osdev, only the os
devices of that subtype will be counted.

-I --intersect <type|depth>
Find the list of objects of the given type or depth that intersect the CPU set
and report the comma-separated list of their indexes instead of the cpu mask
string. This may be used for determining the list of objects above or below the
input objects.

When combined with --physical, the list is convenient to pass to external tools
such as taskset or numactl --physcpubind or --membind. This is different from
--largest since the latter requires that all reported objects are strictly
included inside the input objects.

If an OS device subtype such as gpu is given instead of osdev, only the os
devices of that subtype will be returned.

-H --hierarchical <type1>.<type2>...
Find the list of objects of type <type2> that intersect the CPU set and report
the space-separated list of their hierarchical indexes with respect to <type1>,
<type2>, etc. For instance, if package.core is given, the output would be
Package:1.Core:2 Package:2.Core:3 if the input contains the third core of the
second package and the fourth core of the third package.

Only normal CPU-side object types should be used.

NUMA nodes may be used but they may cause redundancy in the output on
heterogeneous memory platform. For instance, on a platform with both DRAM and
HBM memory on a package, the first core will be considered both as first core of
first NUMA node (DRAM) and as first core of second NUMA node (HBM).

--largest Report (in a human readable format) the list of largest objects which exactly
include all input objects (by looking at their CPU sets). None of these output
objects intersect each other, and the sum of them is exactly equivalent to the
input. No largest object is included in the input This is different from
--intersect where reported objects may not be strictly included in the input.

--local-memory
Report the list of NUMA nodes that are local to the input objects.

This option is similar to -I numa but the way nodes are selected is different:
The selection performed by --local-memory may be precisely configured with
--local-memory-flags, while -I numa just selects all nodes that are somehow
local to any of the input objects.

--local-memory-flags
Change the flags used to select local NUMA nodes. Flags may be given as numeric
values or as a comma-separated list of flag names that are passed to
hwloc_get_local_numanode_objs(). Those names may be substrings of actual flag
names as long as a single one matches. The default is 3 (or smaller,larger)
which means NUMA nodes are displayed if their locality either contains or is
contained in the locality of the given object.

This option enables --local-memory.

--best-memattr <name>
Enable the listing of local memory nodes with --local-memory, but only display
the local node that has the best value for the memory attribute given by <name>
(or as an index).

If the memory attribute values depend on the initiator, the hwloc-calc input
objects are used as the initiator.

Standard attribute names are Capacity, Locality, Bandwidth, and Latency. All
existing attributes in the current topology may be listed with

$ lstopo --memattrs

--sep <sep>
Change the field separator in the output. By default, a space is used to
separate output objects (for instance when --hierarchical or --largest is given)
while a comma is used to separate indexes (for instance when --intersect is
given).

--single Singlify the output to a single CPU.

--taskset Display CPU set strings in the format recognized by the taskset command-line
program instead of hwloc-specific CPU set string format. This option has no
impact on the format of input CPU set strings, both formats are always accepted.

-q --quiet
Hide non-fatal error messages. It mostly includes locations pointing to non-
existing objects.

-v --verbose
Verbose output.

--version Report version and exit.

-h --help Display help message and exit.

DESCRIPTION

       hwloc-calc  generates  and manipulates CPU mask strings or objects.  Both input and output
       may be either objects (with physical or logical indexes),  CPU  lists  (with  physical  or
       logical  indexes),  or  CPU  mask  strings  (always  physically  indexed).  Input location
       specification is described in hwloc(7).

       If objects or CPU mask strings are given on the command-line,  they  are  combined  and  a
       single output is printed.  If no object or CPU mask strings are given on the command-line,
       the program will read the standard input.  It will combine multiple objects  or  CPU  mask
       strings  that  are  given  on  the  same  line  of  the standard input line with spaces as
       separators.  Different input lines will be processed separately.

       Command-line arguments and options are processed in order.  First  topology  configuration
       options should be given.  Then, for instance, changing the type of input indexes with --li
       or changing the  input  topology  with  -i  only  affects  the  processing  the  following
       arguments.

       NOTE:  It  is  highly  recommended that you read the hwloc(7) overview page before reading
       this man page.  Most of the concepts described in hwloc(7) directly apply  to  the  hwloc-
       calc utility.

EXAMPLES

       hwloc-calc's operation is best described through several examples.

       To display the (physical) CPU mask corresponding to the second package:

           $ hwloc-calc package:1
           0x000000f0

       To  display the (physical) CPU mask corresponding to the third pacakge, excluding its even
       numbered logical processors:

           $ hwloc-calc package:2 ~PU:even
           0x00000c00

       To convert a cpu mask to human-readable output, the -H option can be used to emit a space-
       delimited list of locations:

           $ echo 0x000000f0 | hwloc-calc -H package.core
           Package:1.Core1 Package:1.Core:1 Package:1.Core:2 Package:1.Core:3

       To  use  some  other  character (e.g., a comma) instead of spaces in output, use the --sep
       option:

           $ echo 0x000000f0 | hwloc-calc -H package.core --sep ,
           Package:1.Core1,Package:1.Core:1,Package:1.Core:2,Package:1.Core:3

       To combine two (physical) CPU masks:

           $ hwloc-calc 0x0000ffff 0xff000000
           0xff00ffff

       To display the list of logical numbers of processors included in the second package:

           $ hwloc-calc --intersect PU package:1
           4,5,6,7

       To bind GNU OpenMP threads logically over the whole  machine,  we  need  to  use  physical
       number output instead:

           $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU all`
           $ echo $GOMP_CPU_AFFINITY
           0,4,1,5,2,6,3,7

       To  display the list of NUMA nodes, by physical indexes, that intersect a given (physical)
       CPU mask:

           $ hwloc-calc --physical --intersect NUMAnode 0xf0f0f0f0
           0,2

       To find how many cores are in  the  second  CPU  kind  (those  cores  are  likely  higher-
       performance and more power-hungry than cores of the first kind):

           $ hwloc-calc --cpukind 1 -N core all
           4

       To display the list of NUMA nodes, by physical indexes, whose locality is exactly equal to
       a Package:

           $ hwloc-calc --local-memory-flags 0 pack:1
           4,7

       To display the best-capacity NUMA node, by physical  indexe,  whose  locality  is  exactly
       equal to a Package:

           $ hwloc-calc --local-memory-flags 0 --best-memattr capacity pack:1
           4

       Converting  object  logical indexes (default) from/to physical/OS indexes may be performed
       with --intersect combined with either --physical-output (logical to  physical  conversion)
       or --physical-input (physical to logical):

           $ hwloc-calc --physical-output PU:2 --intersect PU
           3
           $ hwloc-calc --physical-input PU:3 --intersect PU
           2

       One  should  add --nodeset when converting indexes of memory objects to make sure a single
       NUMA node index is returned on platforms with heterogeneous memory:

           $ hwloc-calc --nodeset --physical-output node:2 --intersect node
           3
           $ hwloc-calc --nodeset --physical-input node:3 --intersect node
           2

       To display the set of CPUs near network interface eth0:

           $ hwloc-calc os=eth0
           0x00005555

       To display the indexes of packages near PCI device whose bus ID is 0000:01:02.0:

           $ hwloc-calc pci=0000:01:02.0 --intersect Package
           1

       To display the list of per-package cores that intersect the input:

           $ hwloc-calc 0x00003c00 --hierarchical package.core
           Package:2.Core:1 Package:3.Core:0

       To display the (physical) CPU mask of the entire topology except the third package:

           $ hwloc-calc all ~package:3
           0x0000f0ff

       To combine both physical and logical indexes as input:

           $ hwloc-calc PU:2 --physical-input PU:3
           0x0000000c

       To synthetize a set of cores into largest objects on a 2-node 2-package 2-core machine:

           $ hwloc-calc core:0 --largest
           Core:0
           $ hwloc-calc core:0-1 --largest
           Package:0
           $ hwloc-calc core:4-7 --largest
           NUMANode:1
           $ hwloc-calc core:2-6 --largest
           Package:1 Package:2 Core:6
           $ hwloc-calc pack:2 --largest
           Package:2
           $ hwloc-calc package:2-3 --largest
           NUMANode:1

       To get the set of first threads of all cores:

           $ hwloc-calc core:all.pu:0
           $ hwloc-calc --no-smt all

       This can also be very useful in order to make GNU OpenMP use exactly one thread per  core,
       and in logical core order:

           $ export OMP_NUM_THREADS=`hwloc-calc --number-of core all`
           $ echo $OMP_NUM_THREADS
           4
           $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU --no-smt all`
           $ echo $GOMP_CPU_AFFINITY
           0,2,1,3

       To  export  bitmask  in  a  format  that is acceptable by the resctrl Linux subsystem (for
       configuring cache partitioning, etc), apply a sed regexp to the output of hwloc-calc:

           $ hwloc-calc pack:all.core:7-9.pu:0
           0x00000380,,0x00000380   <this format cannot be given to resctrl>
           $ hwloc-calc pack:all.core:7-9.pu:0 | sed -e 's/0x//g' -e 's/,,/,0,/g' -e 's/,,/,0,/g'
           00000380,0,00000380
           # echo 00000380,0,00000380 > /sys/fs/resctrl/test/cpus
           # cat /sys/fs/resctrl/test/cpus
           00000000,00000380,00000000,00000380   <the modified bitmask was  corrected  parsed  by
       resctrl>

       OS devices may also be filtered by subtype. In this example, there are 8 OS devices in the
       system, 4 of them are near NUMA node #1, and only 2 of these are CoProcessors:

           $ utils/hwloc/hwloc-calc -I osdev all
           0,1,2,3,4,5,6,7,8
           $ utils/hwloc/hwloc-calc -I osdev node:1
           5,6,7,8
           $ utils/hwloc/hwloc-calc -I coproc node:1
           7,8

RETURN VALUE

       Upon successful execution, hwloc-calc displays the (physical) CPU mask  string,  (physical
       or logical) object list, or (physical or logical) object number list.  The return value is
       0.

       hwloc-calc will return nonzero if any kind of error occurs, such as (but not limited  to):
       failure to parse the command line.