oracular (8) opahostadmin.8.gz

Provided by: opa-fastfabric_10.10.3.0.11-1ubuntu3_amd64 bug

NAME

       opahostadmin

       (Host)  Performs  a  number of multi-step host initialization and verification operations,
       including upgrading software or  firmware,  rebooting  hosts,  and  other  operations.  In
       general, operations performed by opahostadmin involve a login to one or more host systems.

Syntax

       opahostadmin [-c] [-i  ipoib_suffix] [-f  hostfile] [-h 'hosts']
       [-r  release] [-I  install_options] [-U  upgrade_options] [-d  dir]
       [-T  product] [-P  packages] [-m  netmask] [-S]  operation ...

Options

       --help

                 Produces full help text.

       -c

                 Overwrites the result files from any previous run before starting this run.

       -i ipoib_suffix

                 Specifies  the suffix to apply to host names to create IPoIB host names. Default
                 is -opa.

       -f hostfile

                 Specifies  the  file  with  the  names  of  hosts  in  a  cluster.  Default   is
                 /etc/opa/hosts file.

       -h hosts

                 Specifies the list of hosts to execute the operation against.

       -r release

                 Specifies  the  software  version  to load/upgrade to. Default is the version of
                 Intel(R) Omni-Path Software presently being run on the server.

       -d dir

                 Specifies the directory to retrieve  product. release.tgz for load or upgrade.

       -I install_options

                 Specifies the software install options.

       -U upgrade_options

                 Specifies the software upgrade options.

       -T product

                 Specifies the product type to install. Default is  IntelOPA-Basic.  <distro>  or
                 IntelOPA-IFS. <distro> where <distro> is the distribution and CPU.

       -P packages

                 Specifies the packages to install. Default is oftools ipoib psm_mpi

       -m netmask

                 Specifies the IPoIB netmask to use for configipoib operation.

       -S

                 Securely prompts for user password on remote system.

       operation

                 Performs the specified operation, which can be one or more of the following:

                 load      Starts initial installation of all hosts.

                 upgrade   Upgrades installation of all hosts.

                 configipoib
                           Creates ifcfg-ib1 using host IP address from /etc/hosts file.

                 reboot    Reboots hosts, ensures they go down and come back.

                 sacache   Confirms sacache has all hosts in it.

                 ipoibping Verifies this host can ping each host through IPoIB.

                 mpiperf   Verifies latency and bandwidth for each host.

                 mpiperfdeviation
                           Verifies  latency  and  bandwidth  for  each  host  against  a defined
                           threshold (or relative to average host performance).

Example

       opahostadmin -c reboot

       opahostadmin upgrade

       opahostadmin -h 'elrond arwen' reboot

       HOSTS='elrond arwen' opahostadmin reboot

Details

       opahostadmin provides detailed logging of its results.  During  each  run,  the  following
       files are produced:

       •      test.res : Appended with summary results of run.

       •      test.log : Appended with detailed results of run.

       •      save_tmp/ : Contains a directory per failed test with detailed logs.

       •      test_tmp*/ : Intermediate result files while test is running.

       The -c option removes all log files.

       Results  from  opahostadmin  are  grouped  into test suites, test cases, and test items. A
       given run of opahostadmin represents a single test suite. Within a  test  suite,  multiple
       test  cases  occur;  typically  one test case per host being operated on. Some of the more
       complex operations may have multiple test items per test case. Each test item represents a
       major step in the overall test case.

       Each  opahostadmin  run  appends  to test.res and test.log, and creates temporary files in
       test_tmp$PID in the current directory. test.res provides an overall summary of  operations
       performed  and their results. The same information is also displayed while opahostadmin is
       executing. test.log contains detailed information about what was performed, including  the
       specific  commands  executed  and  the  resulting output. The test_tmp directories contain
       temporary files which reflect tests in progress (or killed). The logs for any failures are
       logged  in the save_temp directory with a directory per failed test case. If the same test
       case fails more than once, save_temp retains  the  information  from  the  first  failure.
       Subsequent  runs  of  opahostadmin  are  appended  to test.log. Intel recommends reviewing
       failures  and  using  the  -c  option  to  remove  old  logs  before  subsequent  runs  of
       opahostadmin.

       opahostadmin  implicitly  performs  its  operations in parallel. However, as for the other
       tools, FF_MAX_PARALLEL can be exported to change the degree of parallelism. 1000  parallel
       operations is the default.

Environment Variables

       The following environment variables are also used by this command:

       HOSTS

                 List of hosts, used if -h option not supplied.

       HOSTS_FILE

                 File containing list of hosts, used in absence of -f and -h.

       FF_MAX_PARALLEL

                 Maximum concurrent operations are performed.

       FF_SERIALIZE_OUTPUT

                 Serialize output of parallel operations (yes or no).

       FF_TIMEOUT_MULT

                 Multiplier  for  all  timeouts associated with this command. Used if the systems
                 are slow for some reason.

opahostadmin Operation Details

       (Host) Intel recommends that you set up password SSH or SCP for use during this operation.
       Alternatively,  the -S option can be used to securely prompt for a password, in which case
       the same password is used for all hosts. Alternately, the  password  may  be  put  in  the
       environment or the opafastfabric.conf file using FF_PASSWORD and FF_ROOTPASS.

       load

                 Performs  an  initial  installation of Intel(R) Omni-Path Software on a group of
                 hosts. Any existing installation is uninstalled and existing configuration files
                 are removed. Subsequently, the hosts are installed with a default Intel(R) Omni-
                 Path Software configuration. The -I option  can  be  used  to  select  different
                 install  packages.  Default  is  oftools  ipoib mpi The -r option can be used to
                 specify a release to install other than the one  that  this  host  is  presently
                 running.  The   FF_PRODUCT.  FF_PRODUCT_VERSION.tgz file (for example, IntelOPA-
                 Basic. version.tgz) is expected to exist  in  the  directory  specified  by  -d.
                 Default  is  the  current working directory. The specified software is copied to
                 all the selected hosts and installed.

       upgrade

                 Upgrades all selected hosts  without  modifying  existing  configurations.  This
                 operation is comparable to the -U option when running ./INSTALL manually. The -r
                 option can be used to upgrade to a release different from this host. The default
                 is   to   upgrade   to   the   same  release  as  this  host.  The   FF_PRODUCT.
                 FF_PRODUCT_VERSION.tgz  file  (for  example,  IntelOPA-Basic.  version.tgz)   is
                 expected  to  exist in the directory specified by -d. The default is the current
                 working directory. The specified software is copied to all  the  end  nodes  and
                 installed.

       NOTE:  Only components that are currently installed are upgraded. This operation fails for
       hosts that do not have Intel(R) Omni-Path Software installed.

       configipoib

                 Creates a ifcfg-ib1 configuration file for each node using the IP address  found
                 using the resolver on the node. The standard Linux* resolver is used through the
                 host command. (If running OFA Delta, this option configures ifcfg-ib0 .)

                 If the host is not found, /etc/hosts on the  node  is  checked.  The  -i  option
                 specifies  an  IPoIB  suffix  to apply to the host name to create the IPoIB host
                 name for the node. The default suffix is -ib. The -m option specifies a  netmask
                 other  than the default for the given class of IP address, such as when dividing
                 a class A or B address into smaller IP subnets. IPoIB is configured for a static
                 IP  address  and is autostarted at boot. For the Intel(R) OP Software Stack, the
                 default  /etc/ipoib.cfg  file  is  used,  which  provides  a   redundant   IPoIB
                 configuration using both ports of the first HFI in the system.

       NOTE:  opahostadmin configipoib now supports DHCP (auto or static options) for configuring
       the IPoIB interface. You must specify these options in /etc/opa/opafastfabric.conf against
       the FF_IPOIB_CONFIG variable. If no options are found, the static IP configuration is used
       by default. If auto is specified, then one IP  address  from  either  static  or  dhcp  is
       chosen.  Static  is  used  if  the  IP  address  can  be obtained out of /etc/hosts or the
       resolver, otherwise DHCP is used.

       reboot

                 Reboots the given hosts and ensures they go down and come  back  up  by  pinging
                 them  during  the  reboot  process. The ping rate is slow (5 seconds), so if the
                 servers boot faster than this, false failures may be seen.

       sacache

                 Verifies the given hosts can properly communicate with the SA and any cached  SA
                 data that is up to date. To run this command, Intel(R) Omni-Path Fabric software
                 must be installed and running  on  the  given  hosts.  The  subnet  manager  and
                 switches  must  be up. If this test fails: opacmdall 'opasaquery -o desc' can be
                 run against any problem hosts.

       NOTE: This operation requires that the hosts being queried are specified by  a  resolvable
       TCP/IP host name. This operation FAILS if the selected hosts are specified by IP address.

       ipoibping

                 Verifies  IPoIB  basic  operation  by  ensuring that the host can ping all other
                 nodes through IPoIB. To run this command,  Intel(R)  Omni-Path  Fabric  software
                 must  be  installed,  IPoIB  must be configured and running on the host, and the
                 given hosts, the SM, and switches must be up.  The  -i  option  can  specify  an
                 alternate IPoIB hostname suffix.

       mpiperf

                 Verifies that MPI is operational and checks MPI end-to-end latency and bandwidth
                 between pairs of nodes (for example, 1-2, 3-4, 5-6). Use this to  verify  switch
                 latency/hops,  PCI  bandwidth,  and  overall  MPI performance. The test.res file
                 contains the results of each pair of nodes tested.

       NOTE: This option is available for the Intel(R) Omni-Path Fabric Host Software  OFA  Delta
       packaging, but is not presently available for other packagings of OFED.

              To  obtain  accurate  results,  this  test  should  be  run at a time when no other
              stressful  applications  (for  example,  MPI  jobs  or  high  stress  file   system
              operations) are running on the given hosts.

              Bandwidth  issues  typically  indicate  server  configuration  issues (for example,
              incorrect slot used, incorrect BIOS settings, or incorrect HFI  model),  or  fabric
              issues  (for  example,  symbol  errors,  incorrect  link width, or speed). Assuming
              opareport has previously been used to check for link errors and link speed  issues,
              the server configuration should be verified.

              Note  that  BIOS  settings  and  differences  between server models can account for
              10-20% differences in bandwidth. For more details about BIOS settings, consult  the
              documentation from the server supplier and/or the server PCI chipset manufacturer.

       mpiperfdeviation

                 Specifies  the enhanced version of mpiperf that verifies MPI performance. Can be
                 used to verify switch latency/hops, PCI bandwidth, and overall MPI  performance.
                 It  performs  assorted  pair-wise bandwidth and latency tests, and reports pairs
                 outside an acceptable tolerance range. The tool identifies specific  nodes  that
                 have  problems  and  provides  a  concise  summary of results. The test.res file
                 contains the results of each pair of nodes tested.

                 By default, concurrent mode is used to  quickly  analyze  the  fabric  and  host
                 performance.  Pairs  that  have  20% less bandwidth or 50% more latency than the
                 average pair are reported as failures.

                 The tool can be run in a sequential or a concurrent mode. Sequential  mode  runs
                 each  host  against a reference host. By default, the reference host is selected
                 based on the best performance from a quick  test  of  the  first  40  hosts.  In
                 concurrent  mode,  hosts are paired up and all pairs are run concurrently. Since
                 there may be fabric contention during such a run, any poor performing pairs  are
                 then rerun sequentially against the reference host.

                 Concurrent  mode  runs  the  tests  in the shortest amount of time, however, the
                 results could be slightly less accurate due to  switch  contention.  In  heavily
                 oversubscribed  fabric designs, if concurrent mode is producing unexpectedly low
                 performance, try sequential mode.

       NOTE: This option is available for the Intel(R) Omni-Path Fabric Host Software  OFA  Delta
       packaging, but is not presently available for other packagings of OFED.

              To  obtain  accurate  results,  this  test  should  be  run at a time when no other
              stressful applications (for example, MPI jobs, high stress file system  operations)
              are running on the given hosts.

              Bandwidth  issues  typically  indicate  server  configuration  issues (for example,
              incorrect slot used, incorrect BIOS settings, or incorrect HFI  model),  or  fabric
              issues  (for  example,  symbol  errors,  incorrect  link width, or speed). Assuming
              opareport has previously been used to check for link errors and link speed  issues,
              the server configuration should be verified.

              Note  that  BIOS  settings  and  differences  between server models can account for
              10-20% differences in bandwidth. A result 5-10% below the average is typically  not
              cause  for  serious  alarm, but may reflect limitations in the server design or the
              chosen BIOS settings.

              For more details about BIOS settings, consult the  documentation  from  the  server
              supplier and/or the server PCI chipset manufacturer.

              The  deviation  application  supports  a  number of parameters which allow for more
              precise control over the mode, benchmark and pass/fail criteria. The parameters  to
              use  can  be  selected  using  the  FF_DEVIATION_ARGS  configuration  parameter  in
              opafastfabric.conf

              Available parameters for deviation application:

              [-bwtol bwtol] [-bwdelta MBs] [-bwthres MBs]

              [-bwloop count] [-bwsize size] [-lattol latol]

              [-latdelta usec] [-latthres usec] [-latloop count]

              [-latsize size][-c] [-b] [-v] [-vv]

              [-h reference_host]

              -bwtol    Specifies the percent of  bandwidth  degradation  allowed  below  average
                        value.

              -bwbidir  Performs a bidirectional bandwidth test.

              -bwunidir Performs a unidirectional bandwidth test (Default).

              -bwdelta  Specifies  the  limit  in  MB/s  of  bandwidth  degradation allowed below
                        average value.

              -bwthres  Specifies the lower limit in MB/s of bandwidth allowed.

              -bwloop   Specifies the number of loops to execute each bandwidth test.

              -bwsize   Specifies the size of message to use for bandwidth test.

              -lattol   Specifies the percent of latency degradation allowed above average value.

              -latdelta Specifies the imit in &#181;sec  of  latency  degradation  allowed  above
                        average value.

              -latthres Specifies the lower limit in &#181;sec of latency allowed.

              -latloop  Specifies the number of loops to execute each latency test.

              -latsize  Specifies the size of message to use for latency test.

              -c        Runs test pairs concurrently instead of the default of sequential.

              -b        When  comparing results against tolerance and delta, uses best instead of
                        average.

              -v        Specifies the verbose output.

              -vv       Specifies the very verbose output.

              -h        Specifies the reference host to use for sequential pairing.

              Both bwtol and bwdelta must be exceeded to fail bandwidth test.

              When bwthres is supplied, bwtol and bwdelta are ignored.

              Both lattol and latdelta must be exceeded to fail latency test.

              When latthres is supplied, lattol and latdelta are ignored.

              For consistency with OSU benchmarks, MB/s is defined as 1000000 bytes/s.