Provided by: libfabric-dev_1.17.0-3ubuntu1_amd64 bug

NAME

       fi_rxm - The RxM (RDM over MSG) Utility Provider

OVERVIEW

       The  RxM  provider  (ofi_rxm)  is an utility provider that supports FI_EP_RDM type endpoint emulated over
       FI_EP_MSG type endpoint(s) of an underlying core provider.  FI_EP_RDM endpoints have a reliable  datagram
       interface  and  RxM  emulates  this by hiding the connection management of underlying FI_EP_MSG endpoints
       from the user.  Additionally, RxM can hide memory registration requirement  from  a  core  provider  like
       verbs if the apps don’t support it.

REQUIREMENTS

   Requirements for core provider
       RxM provider requires the core provider to support the following features:

       • MSG endpoints (FI_EP_MSG)

       • RMA read/write (FI_RMA) - Used for implementing rendezvous protocol for large messages.

       • FI_OPT_CM_DATA_SIZE of at least 24 bytes.

   Requirements for applications
       Since RxM emulates RDM endpoints by hiding connection management and connections are established only on-
       demand (when app tries to send data),  the  first  several  data  transfer  calls  would  return  EAGAIN.
       Applications should be aware of this and retry until the operation succeeds.

       If  an  application  has chosen manual progress for data progress, it should also read the CQ so that the
       connection establishment progresses.  Not doing so would result in a stall.  See also the ERRORS  section
       in fi_msg(3).

SUPPORTED FEATURES

       The RxM provider currently supports FI_MSG, FI_TAGGED, FI_RMA and FI_ATOMIC capabilities.

       Endpoint types
              The provider supports only FI_EP_RDM.

       Endpoint capabilities
              The following data transfer interface is supported: FI_MSG, FI_TAGGED, FI_RMA, FI_ATOMIC.

       Progress
              The  RxM  provider  supports  both  FI_PROGRESS_MANUAL  and  FI_PROGRESS_AUTO.  Manual progress in
              general has better connection scale-up and lower CPU utilization since there’s no  separate  auto-
              progress thread.

       Addressing Formats
              FI_SOCKADDR, FI_SOCKADDR_IN

       Memory Region
              FI_MR_VIRT_ADDR,  FI_MR_ALLOCATED,  FI_MR_PROV_KEY  MR mode bits would be required from the app in
              case the core provider requires it.

LIMITATIONS

       When using RxM provider, some limitations from the underlying MSG provider could also  show  up.   Please
       refer to the corresponding MSG provider man pages to find about those limitations.

   Unsupported features
       RxM provider does not support the following features:

       • op_flags: FI_FENCE.

       • Scalable endpoints

       • Shared contexts

       • FABRIC_DIRECT

       • FI_MR_SCALABLE

       • Authorization keys

       • Application error data buffers

       • Multicast

       • FI_SYNC_ERR

       • Reporting unknown source addr data as part of completions

       • Triggered operations

   Progress limitations
       When  sending  large  messages,  an app doing an sread or waiting on the CQ file descriptor may not get a
       completion when reading the CQ after being woken up from the wait.  The app has to do sread  or  wait  on
       the  file  descriptor  again.   This  is  needed because RxM uses a rendezvous protocol for large message
       sends.  An app would get woken up from waiting on CQ fd when rendezvous protocol request completes but it
       would  have to wait again to get an ACK from the receiver indicating completion of large message transfer
       by remote RMA read.

   FI_ATOMIC limitations
       The FI_ATOMIC capability will only be listed in the fi_info if  the  fi_info  hints  parameter  specifies
       FI_ATOMIC.    If   FI_ATOMIC  is  requested,  message  order  FI_ORDER_RAR,  FI_ORDER_RAW,  FI_ORDER_WAR,
       FI_ORDER_WAW, FI_ORDER_SAR, and FI_ORDER_SAW can not be supported.

   Miscellaneous limitations
       • RxM protocol peers should have same endian-ness  otherwise  connections  won’t  successfully  complete.
         This enables better performance at run-time as byte order translations are avoided.

RUNTIME PARAMETERS

       The ofi_rxm provider checks for the following environment variables.

       FI_OFI_RXM_BUFFER_SIZE
              Defines  the  transmit  buffer  size  /  inject  size.   Messages  of size less than this would be
              transmitted via an eager protocol and those above would be transmitted via  a  rendezvous  or  SAR
              (Segmentation  And  Reassembly) protocol.  Transmit data would be copied up to this size (default:
              ~16k).

       FI_OFI_RXM_COMP_PER_PROGRESS
              Defines the maximum number of MSG provider CQ entries (default: 1) that would be read per progress
              (RxM CQ read).

       FI_OFI_RXM_ENABLE_DYN_RBUF
              Enables  support  for  dynamic  receive  buffering, if available by the message endpoint provider.
              This feature allows direct placement of received message data into application buffers,  bypassing
              RxM  bounce buffers.  This feature targets providers that provide internal network buffering, such
              as the tcp provider.  (default: false)

       FI_OFI_RXM_SAR_LIMIT
              Set this environment variable to control the  RxM  SAR  (Segmentation  And  Reassembly)  protocol.
              Messages of size greater than this (default: 128 Kb) would be transmitted via rendezvous protocol.

       FI_OFI_RXM_USE_SRX
              Set  this  to  1  to  use  shared  receive context from MSG provider, or 0 to disable using shared
              receive context.  Shared receive contexts reduce overall memory usage, but may increase in message
              latency.   If not set, verbs will not use shared receive contexts by default, but the tcp provider
              will.

       FI_OFI_RXM_TX_SIZE
              Defines default TX context size (default: 1024)

       FI_OFI_RXM_RX_SIZE
              Defines default RX context size (default: 1024)

       FI_OFI_RXM_MSG_TX_SIZE
              Defines FI_EP_MSG TX size that would be requested (default: 128).

       FI_OFI_RXM_MSG_RX_SIZE
              Defines FI_EP_MSG RX size that would be requested (default: 128).

       FI_UNIVERSE_SIZE
              Defines the expected number of ranks / peers an endpoint would communicate with (default: 256).

       FI_OFI_RXM_CM_PROGRESS_INTERVAL
              Defines the duration of time in microseconds between calls to RxM CM  progression  functions  when
              using  manual  progress.   Higher values may provide less noise for calls to fi_cq read functions,
              but may increase connection setup time (default: 10000)

       FI_OFI_RXM_CQ_EQ_FAIRNESS
              Defines the maximum number of message provider CQ entries that can be  consecutively  read  across
              progress calls without checking to see if the CM progress interval has been reached (default: 128)

Tuning

   Bandwidth
       To   optimize  for  bandwidth,  ensure  you  use  higher  values  than  default  for  FI_OFI_RXM_TX_SIZE,
       FI_OFI_RXM_RX_SIZE, FI_OFI_RXM_MSG_TX_SIZE, FI_OFI_RXM_MSG_RX_SIZE subject to memory limits of the system
       and the tx and rx sizes supported by the MSG provider.

       FI_OFI_RXM_SAR_LIMIT is another knob that can be experimented with to optimze for bandwidth.

   Memory
       To   conserve   memory,   ensure  FI_UNIVERSE_SIZE  set  to  what  is  required.   Similarly  check  that
       FI_OFI_RXM_TX_SIZE, FI_OFI_RXM_RX_SIZE, FI_OFI_RXM_MSG_TX_SIZE and FI_OFI_RXM_MSG_RX_SIZE  env  variables
       are set to only required values.

NOTES

       The  data  transfer  API  may  return  -FI_EAGAIN  during on-demand connection setup of the core provider
       FI_MSG_EP.  See fi_msg(3) for a detailed description of handling FI_EAGAIN.

Troubleshooting / Known issues

       If an RxM endpoint is expected to communicate with more peers than the default value of  FI_UNIVERSE_SIZE
       (256)  CQ  overruns  can  happen.  To avoid this set a higher value for FI_UNIVERSE_SIZE.  CQ overrun can
       make a MSG endpoint unusable.

       At higher # of ranks, there may be connection errors due to a node running out of memory.  The workaround
       is  to  use  shared  receive contexts for the MSG provider (FI_OFI_RXM_USE_SRX=1) or reduce eager message
       size  (FI_OFI_RXM_BUFFER_SIZE)  and  MSG   provider   TX/RX   queue   sizes   (FI_OFI_RXM_MSG_TX_SIZE   /
       FI_OFI_RXM_MSG_RX_SIZE).

SEE ALSO

       fabric(7), fi_provider(7), fi_getinfo(3)

AUTHORS

       OpenFabrics.