Provided by: opa-fastfabric_10.10.3.0.11-1ubuntu3_amd64 bug

NAME

       opafindgood

       Checks  for hosts that are able to be pinged, accessed via SSH, and active on the Intel(R)
       Omni-Path Fabric. Produces a list of good hosts meeting all criteria.  Typically  used  to
       identify  good  hosts  to  undergo further testing and benchmarking during initial cluster
       staging and startup.

       The resulting good file lists each good host exactly once and can  be  used  as  input  to
       create  mpi_hosts  files  for running mpi_apps and the HFI-SW cable test. The files alive,
       running, active, good, and bad are created in the selected directory listing hosts passing
       each criteria.

       This  command assumes the Node Description for each host is based on the hostname-s output
       in conjunction with an optional hfi1_# suffix. When using a /etc/opa/hosts file that lists
       the hostnames, this assumption may not be correct.

       This  command  automatically  generates  the  file  FF_RESULT_DIR/punchlist.csv. This file
       provides a concise summary of the bad  hosts  found.  This  can  be  imported  into  Excel
       directly  as  a  *.csv  file.  Alternatively,  it  can  be  cut/pasted into Excel, and the
       Data/Text to Columns toolbar can be used to separate the information into multiple columns
       at the semicolons.

       A sample generated output is:

       # opafindgood

       3 hosts will be checked

       2 hosts are pingable (alive)

       2 hosts are ssh'able (running)

       2 total hosts have FIs active on one or more fabrics (active)

       No Quarantine Node Records Returned

       1 hosts are alive, running, active (good)

       2 hosts are bad (bad)

       Bad hosts have been added to /root/punchlist.csv

       # cat /root/punchlist.csv

       2015/10/04 11:33:22;phs1fnivd13u07n1 hfi1_0 p1 phs1swivd13u06 p16;Link errors

       2015/10/07 10:21:05;phs1swivd13u06;Switch not found in SA DB

       2015/10/09 14:36:48;phs1fnivd13u07n4;Doesn't ping

       2015/10/09 14:36:48;phs1fnivd13u07n3;No active port

       For  a  given  run,  a line is generated for each failing host. Hosts are reported exactly
       once for a given run. Therefore, a host that does not ping is NOT listed as can't ssh  nor
       No active port. There may be cases where ports could be active for hosts that do not ping,
       especially if Ethernet host names are used for the ping test. However, the  lack  of  ping
       often  implies there are other fundamental issues, such as PXE boot or inability to access
       DNS or DHCP to get proper host name and IP address. Therefore, reporting hosts that do not
       ping is typically of limited value.

       Note  that  opafindgood queries the SA for NodeDescriptions to determine hosts with active
       ports. As such, ports may be active for hosts that cannot be accessed via SSH or pinged.

       By default, opafindgood checks for and reports nodes that  are  quarantined  for  security
       reasons. To skip this, use the -Q option.

Syntax

       opafindgood [-R|-A|-Q] [-d  dir] [-f  hostfile] [-h 'hosts']
       [-t  portsfile] [-p  ports] [-T  timelimit]

Options

       --help

                 Produces full help text.

       -R

                 Skips the running test (SSH). Recommended if password-less SSH is not set up.

       -A

                 Skips  the  active  test.  Recommended  if Intel(R) Omni-Path Fabric software or
                 fabric is not up.

       -Q

                 Skips the quarantine test. Recommended if Intel(R) Omni-Path Fabric software  or
                 fabric is not up.

       -d dir

                 Specifies the directory in which to create alive, active, running, good, and bad
                 files. Default is /etc/opa directory.

       -f hostfile

                 Specifies the file with hosts in cluster. Default is /etc/opa/hosts directory.

       -h hosts

                 Specifies the list of hosts to ping.

       -t portsfile

                 Specifies the file with list of local HFI ports used  to  access  fabric(s)  for
                 analysis. Default is /etc/opa/ports file.

       -p ports

                 Specifies the list of local HFI ports used to access fabric(s) for analysis.

                 Default  is  first active port. The first HFI in the system is 1. The first port
                 on an HFI is 1. Uses the format hfi:port,
                 for example:

                 0:0       First active port in system.

                 0:y       Port y within system.

                 x:0       First active port on HFI x.

                 x:y       HFI x, port y.

       -T timelimit

                 Specifies the time limit in seconds for host to respond to SSH.  Default  is  20
                 seconds.

Environment Variables

       The following environment variables are also used by this command:

       HOSTS

                 List of hosts, used if -h option not supplied.

       HOSTS_FILE

                 File containing list of hosts, used in absence of -f and -h.

       PORTS

                 List of ports, used in absence of -t and -p.

       PORTS_FILE

                 File containing list of ports, used in absence of -t and -p.

       FF_MAX_PARALLEL

                 Maximum concurrent operations.

Examples

       opafindgood

       opafindgood -f allhosts

       opafindgood -h 'arwen elrond'

       HOSTS='arwen elrond' opafindgood

       HOSTS_FILE=allhosts opafindgood

       opafindgood -p '1:1 1:2 2:1 2:2'