oracular (8) opahostadmin.8.gz
NAME
opahostadmin (Host) Performs a number of multi-step host initialization and verification operations, including upgrading software or firmware, rebooting hosts, and other operations. In general, operations performed by opahostadmin involve a login to one or more host systems.
Syntax
opahostadmin [-c] [-i ipoib_suffix] [-f hostfile] [-h 'hosts'] [-r release] [-I install_options] [-U upgrade_options] [-d dir] [-T product] [-P packages] [-m netmask] [-S] operation ...
Options
--help Produces full help text. -c Overwrites the result files from any previous run before starting this run. -i ipoib_suffix Specifies the suffix to apply to host names to create IPoIB host names. Default is -opa. -f hostfile Specifies the file with the names of hosts in a cluster. Default is /etc/opa/hosts file. -h hosts Specifies the list of hosts to execute the operation against. -r release Specifies the software version to load/upgrade to. Default is the version of Intel(R) Omni-Path Software presently being run on the server. -d dir Specifies the directory to retrieve product. release.tgz for load or upgrade. -I install_options Specifies the software install options. -U upgrade_options Specifies the software upgrade options. -T product Specifies the product type to install. Default is IntelOPA-Basic. <distro> or IntelOPA-IFS. <distro> where <distro> is the distribution and CPU. -P packages Specifies the packages to install. Default is oftools ipoib psm_mpi -m netmask Specifies the IPoIB netmask to use for configipoib operation. -S Securely prompts for user password on remote system. operation Performs the specified operation, which can be one or more of the following: load Starts initial installation of all hosts. upgrade Upgrades installation of all hosts. configipoib Creates ifcfg-ib1 using host IP address from /etc/hosts file. reboot Reboots hosts, ensures they go down and come back. sacache Confirms sacache has all hosts in it. ipoibping Verifies this host can ping each host through IPoIB. mpiperf Verifies latency and bandwidth for each host. mpiperfdeviation Verifies latency and bandwidth for each host against a defined threshold (or relative to average host performance).
Example
opahostadmin -c reboot opahostadmin upgrade opahostadmin -h 'elrond arwen' reboot HOSTS='elrond arwen' opahostadmin reboot
Details
opahostadmin provides detailed logging of its results. During each run, the following files are produced: • test.res : Appended with summary results of run. • test.log : Appended with detailed results of run. • save_tmp/ : Contains a directory per failed test with detailed logs. • test_tmp*/ : Intermediate result files while test is running. The -c option removes all log files. Results from opahostadmin are grouped into test suites, test cases, and test items. A given run of opahostadmin represents a single test suite. Within a test suite, multiple test cases occur; typically one test case per host being operated on. Some of the more complex operations may have multiple test items per test case. Each test item represents a major step in the overall test case. Each opahostadmin run appends to test.res and test.log, and creates temporary files in test_tmp$PID in the current directory. test.res provides an overall summary of operations performed and their results. The same information is also displayed while opahostadmin is executing. test.log contains detailed information about what was performed, including the specific commands executed and the resulting output. The test_tmp directories contain temporary files which reflect tests in progress (or killed). The logs for any failures are logged in the save_temp directory with a directory per failed test case. If the same test case fails more than once, save_temp retains the information from the first failure. Subsequent runs of opahostadmin are appended to test.log. Intel recommends reviewing failures and using the -c option to remove old logs before subsequent runs of opahostadmin. opahostadmin implicitly performs its operations in parallel. However, as for the other tools, FF_MAX_PARALLEL can be exported to change the degree of parallelism. 1000 parallel operations is the default.
Environment Variables
The following environment variables are also used by this command: HOSTS List of hosts, used if -h option not supplied. HOSTS_FILE File containing list of hosts, used in absence of -f and -h. FF_MAX_PARALLEL Maximum concurrent operations are performed. FF_SERIALIZE_OUTPUT Serialize output of parallel operations (yes or no). FF_TIMEOUT_MULT Multiplier for all timeouts associated with this command. Used if the systems are slow for some reason.
opahostadmin Operation Details
(Host) Intel recommends that you set up password SSH or SCP for use during this operation. Alternatively, the -S option can be used to securely prompt for a password, in which case the same password is used for all hosts. Alternately, the password may be put in the environment or the opafastfabric.conf file using FF_PASSWORD and FF_ROOTPASS. load Performs an initial installation of Intel(R) Omni-Path Software on a group of hosts. Any existing installation is uninstalled and existing configuration files are removed. Subsequently, the hosts are installed with a default Intel(R) Omni- Path Software configuration. The -I option can be used to select different install packages. Default is oftools ipoib mpi The -r option can be used to specify a release to install other than the one that this host is presently running. The FF_PRODUCT. FF_PRODUCT_VERSION.tgz file (for example, IntelOPA- Basic. version.tgz) is expected to exist in the directory specified by -d. Default is the current working directory. The specified software is copied to all the selected hosts and installed. upgrade Upgrades all selected hosts without modifying existing configurations. This operation is comparable to the -U option when running ./INSTALL manually. The -r option can be used to upgrade to a release different from this host. The default is to upgrade to the same release as this host. The FF_PRODUCT. FF_PRODUCT_VERSION.tgz file (for example, IntelOPA-Basic. version.tgz) is expected to exist in the directory specified by -d. The default is the current working directory. The specified software is copied to all the end nodes and installed. NOTE: Only components that are currently installed are upgraded. This operation fails for hosts that do not have Intel(R) Omni-Path Software installed. configipoib Creates a ifcfg-ib1 configuration file for each node using the IP address found using the resolver on the node. The standard Linux* resolver is used through the host command. (If running OFA Delta, this option configures ifcfg-ib0 .) If the host is not found, /etc/hosts on the node is checked. The -i option specifies an IPoIB suffix to apply to the host name to create the IPoIB host name for the node. The default suffix is -ib. The -m option specifies a netmask other than the default for the given class of IP address, such as when dividing a class A or B address into smaller IP subnets. IPoIB is configured for a static IP address and is autostarted at boot. For the Intel(R) OP Software Stack, the default /etc/ipoib.cfg file is used, which provides a redundant IPoIB configuration using both ports of the first HFI in the system. NOTE: opahostadmin configipoib now supports DHCP (auto or static options) for configuring the IPoIB interface. You must specify these options in /etc/opa/opafastfabric.conf against the FF_IPOIB_CONFIG variable. If no options are found, the static IP configuration is used by default. If auto is specified, then one IP address from either static or dhcp is chosen. Static is used if the IP address can be obtained out of /etc/hosts or the resolver, otherwise DHCP is used. reboot Reboots the given hosts and ensures they go down and come back up by pinging them during the reboot process. The ping rate is slow (5 seconds), so if the servers boot faster than this, false failures may be seen. sacache Verifies the given hosts can properly communicate with the SA and any cached SA data that is up to date. To run this command, Intel(R) Omni-Path Fabric software must be installed and running on the given hosts. The subnet manager and switches must be up. If this test fails: opacmdall 'opasaquery -o desc' can be run against any problem hosts. NOTE: This operation requires that the hosts being queried are specified by a resolvable TCP/IP host name. This operation FAILS if the selected hosts are specified by IP address. ipoibping Verifies IPoIB basic operation by ensuring that the host can ping all other nodes through IPoIB. To run this command, Intel(R) Omni-Path Fabric software must be installed, IPoIB must be configured and running on the host, and the given hosts, the SM, and switches must be up. The -i option can specify an alternate IPoIB hostname suffix. mpiperf Verifies that MPI is operational and checks MPI end-to-end latency and bandwidth between pairs of nodes (for example, 1-2, 3-4, 5-6). Use this to verify switch latency/hops, PCI bandwidth, and overall MPI performance. The test.res file contains the results of each pair of nodes tested. NOTE: This option is available for the Intel(R) Omni-Path Fabric Host Software OFA Delta packaging, but is not presently available for other packagings of OFED. To obtain accurate results, this test should be run at a time when no other stressful applications (for example, MPI jobs or high stress file system operations) are running on the given hosts. Bandwidth issues typically indicate server configuration issues (for example, incorrect slot used, incorrect BIOS settings, or incorrect HFI model), or fabric issues (for example, symbol errors, incorrect link width, or speed). Assuming opareport has previously been used to check for link errors and link speed issues, the server configuration should be verified. Note that BIOS settings and differences between server models can account for 10-20% differences in bandwidth. For more details about BIOS settings, consult the documentation from the server supplier and/or the server PCI chipset manufacturer. mpiperfdeviation Specifies the enhanced version of mpiperf that verifies MPI performance. Can be used to verify switch latency/hops, PCI bandwidth, and overall MPI performance. It performs assorted pair-wise bandwidth and latency tests, and reports pairs outside an acceptable tolerance range. The tool identifies specific nodes that have problems and provides a concise summary of results. The test.res file contains the results of each pair of nodes tested. By default, concurrent mode is used to quickly analyze the fabric and host performance. Pairs that have 20% less bandwidth or 50% more latency than the average pair are reported as failures. The tool can be run in a sequential or a concurrent mode. Sequential mode runs each host against a reference host. By default, the reference host is selected based on the best performance from a quick test of the first 40 hosts. In concurrent mode, hosts are paired up and all pairs are run concurrently. Since there may be fabric contention during such a run, any poor performing pairs are then rerun sequentially against the reference host. Concurrent mode runs the tests in the shortest amount of time, however, the results could be slightly less accurate due to switch contention. In heavily oversubscribed fabric designs, if concurrent mode is producing unexpectedly low performance, try sequential mode. NOTE: This option is available for the Intel(R) Omni-Path Fabric Host Software OFA Delta packaging, but is not presently available for other packagings of OFED. To obtain accurate results, this test should be run at a time when no other stressful applications (for example, MPI jobs, high stress file system operations) are running on the given hosts. Bandwidth issues typically indicate server configuration issues (for example, incorrect slot used, incorrect BIOS settings, or incorrect HFI model), or fabric issues (for example, symbol errors, incorrect link width, or speed). Assuming opareport has previously been used to check for link errors and link speed issues, the server configuration should be verified. Note that BIOS settings and differences between server models can account for 10-20% differences in bandwidth. A result 5-10% below the average is typically not cause for serious alarm, but may reflect limitations in the server design or the chosen BIOS settings. For more details about BIOS settings, consult the documentation from the server supplier and/or the server PCI chipset manufacturer. The deviation application supports a number of parameters which allow for more precise control over the mode, benchmark and pass/fail criteria. The parameters to use can be selected using the FF_DEVIATION_ARGS configuration parameter in opafastfabric.conf Available parameters for deviation application: [-bwtol bwtol] [-bwdelta MBs] [-bwthres MBs] [-bwloop count] [-bwsize size] [-lattol latol] [-latdelta usec] [-latthres usec] [-latloop count] [-latsize size][-c] [-b] [-v] [-vv] [-h reference_host] -bwtol Specifies the percent of bandwidth degradation allowed below average value. -bwbidir Performs a bidirectional bandwidth test. -bwunidir Performs a unidirectional bandwidth test (Default). -bwdelta Specifies the limit in MB/s of bandwidth degradation allowed below average value. -bwthres Specifies the lower limit in MB/s of bandwidth allowed. -bwloop Specifies the number of loops to execute each bandwidth test. -bwsize Specifies the size of message to use for bandwidth test. -lattol Specifies the percent of latency degradation allowed above average value. -latdelta Specifies the imit in µsec of latency degradation allowed above average value. -latthres Specifies the lower limit in µsec of latency allowed. -latloop Specifies the number of loops to execute each latency test. -latsize Specifies the size of message to use for latency test. -c Runs test pairs concurrently instead of the default of sequential. -b When comparing results against tolerance and delta, uses best instead of average. -v Specifies the verbose output. -vv Specifies the very verbose output. -h Specifies the reference host to use for sequential pairing. Both bwtol and bwdelta must be exceeded to fail bandwidth test. When bwthres is supplied, bwtol and bwdelta are ignored. Both lattol and latdelta must be exceeded to fail latency test. When latthres is supplied, lattol and latdelta are ignored. For consistency with OSU benchmarks, MB/s is defined as 1000000 bytes/s.