Provided by: libfabric-dev_2.1.0-1.1_amd64 bug

NAME

       fi_domain - Open a fabric access domain

SYNOPSIS

              #include <rdma/fabric.h>

              #include <rdma/fi_domain.h>

              int fi_domain(struct fid_fabric *fabric, struct fi_info *info,
                  struct fid_domain **domain, void *context);

              int fi_domain2(struct fid_fabric *fabric, struct fi_info *info,
                  struct fid_domain **domain, uint64_t flags, void *context);

              int fi_close(struct fid *domain);

              int fi_domain_bind(struct fid_domain *domain, struct fid *eq,
                  uint64_t flags);

              int fi_open_ops(struct fid *domain, const char *name, uint64_t flags,
                  void **ops, void *context);

              int fi_set_ops(struct fid *domain, const char *name, uint64_t flags,
                  void *ops, void *context);

ARGUMENTS

       fabric Fabric domain

       info   Fabric  information,  including  domain capabilities and attributes.  The struct fi_info must have
              been obtained using either fi_getinfo() or fi_dupinfo().

       domain An opened access domain.

       context
              User specified context associated with the domain.  This context is returned as part of any  asyn‐
              chronous event associated with the domain.

       eq     Event queue for asynchronous operations initiated on the domain.

       name   Name associated with an interface.

       ops    Fabric interface operations.

DESCRIPTION

       An  access  domain  typically refers to a physical or virtual NIC or hardware port; however, a domain may
       span across multiple hardware components for fail-over or data striping purposes.  A domain  defines  the
       boundary for associating different resources together.  Fabric resources belonging to the same domain may
       share resources.

   fi_domain
       Opens  a fabric access domain, also referred to as a resource domain.  Fabric domains are identified by a
       name.  The properties of the opened domain are specified using the info parameter.

   fi_domain2
       Similar to fi_domain, but accepts an extra parameter flags.  Mainly used for opening  peer  domain.   See
       fi_peer(3).

   fi_open_ops
       fi_open_ops is used to open provider specific interfaces.  Provider interfaces may be used to access low-
       level  resources  and  operations that are specific to the opened resource domain.  The details of domain
       interfaces are outside the scope of this documentation.

   fi_set_ops
       fi_set_ops assigns callbacks that a provider should invoke in place of performing selected  tasks.   This
       allows  users  to  modify  or control a provider’s default behavior.  Conceptually, it allows the user to
       hook specific functions used by a provider and replace it with their own.

       The operations being modified are identified using a well-known character string, passed as the name  pa‐
       rameter.   The format of the ops parameter is dependent upon the name value.  The ops parameter will ref‐
       erence a structure containing the callbacks and other fields needed by the provider to invoke the  user’s
       functions.

       If  a  provider  accepts the override, it will return FI_SUCCESS.  If the override is unknown or not sup‐
       ported, the provider will return -FI_ENOSYS.  Overrides should be set prior to  allocating  resources  on
       the domain.

       The following fi_set_ops operations and corresponding callback structures are defined.

       FI_SET_OPS_HMEM_OVERRIDE  Heterogeneous Memory Overrides

       HMEM  override  allows  users to override HMEM related operations a provider may perform.  Currently, the
       scope of the HMEM override is to allow a user to define the memory movement functions a  provider  should
       use when accessing a user buffer.  The user-defined memory movement functions need to account for all the
       different HMEM iface types a provider may encounter.

       All objects allocated against a domain will inherit this override.

       The following is the HMEM override operation name and structure.

              #define FI_SET_OPS_HMEM_OVERRIDE "hmem_override_ops"

              struct fi_hmem_override_ops {
                  size_t  size;

                  ssize_t (*copy_from_hmem_iov)(void *dest, size_t size,
                      enum fi_hmem_iface iface, uint64_t device, const struct iovec *hmem_iov,
                      size_t hmem_iov_count, uint64_t hmem_iov_offset);

                  ssize_t (*copy_to_hmem_iov)(enum fi_hmem_iface iface, uint64_t device,
                  const struct iovec *hmem_iov, size_t hmem_iov_count,
                      uint64_t hmem_iov_offset, const void *src, size_t size);
              };

       All fields in struct fi_hmem_override_ops must be set (non-null) to a valid value.

       size   This should be set to the sizeof(struct fi_hmem_override_ops).  The size field is used for forward
              and backward compatibility purposes.

       copy_from_hmem_iov
              Copy data from the device/hmem to host memory.  This function should return a negative fi_errno on
              error, or the number of bytes copied on success.

       copy_to_hmem_iov
              Copy data from host memory to the device/hmem.  This function should return a negative fi_errno on
              error, or the number of bytes copied on success.

   fi_domain_bind
       Associates an event queue with the domain.  An event queue bound to a domain will be the default EQ asso‐
       ciated  with  asynchronous control events that occur on the domain or active endpoints allocated on a do‐
       main.  This includes CM events.  Endpoints may direct their control events to alternate  EQs  by  binding
       directly with the EQ.

       Deprecated: Binding an event queue to a domain with the FI_REG_MR flag indicates that the provider should
       perform all memory registration operations asynchronously, with the completion reported through the event
       queue.   If  an  event queue is not bound to the domain with the FI_REG_MR flag, then memory registration
       requests complete synchronously.

   fi_close
       The fi_close call is used to release all resources associated with a domain or  interface.   All  objects
       associated with the opened domain must be released prior to calling fi_close, otherwise the call will re‐
       turn -FI_EBUSY.

DOMAIN ATTRIBUTES

       The fi_domain_attr structure defines the set of attributes associated with a domain.

              struct fi_domain_attr {
                  struct fid_domain     *domain;
                  char                  *name;
                  enum fi_threading     threading;
                  enum fi_progress      progress;
                  enum fi_resource_mgmt resource_mgmt;
                  enum fi_av_type       av_type;
                  int                   mr_mode;
                  size_t                mr_key_size;
                  size_t                cq_data_size;
                  size_t                cq_cnt;
                  size_t                ep_cnt;
                  size_t                tx_ctx_cnt;
                  size_t                rx_ctx_cnt;
                  size_t                max_ep_tx_ctx;
                  size_t                max_ep_rx_ctx;
                  size_t                max_ep_stx_ctx;
                  size_t                max_ep_srx_ctx;
                  size_t                cntr_cnt;
                  size_t                mr_iov_limit;
                  uint64_t              caps;
                  uint64_t              mode;
                  uint8_t               *auth_key;
                  size_t                auth_key_size;
                  size_t                max_err_data;
                  size_t                mr_cnt;
                  uint32_t              tclass;
                  size_t                max_ep_auth_key;
                  uint32_t              max_group_id;
              };

   domain
       On  input to fi_getinfo, a user may set this to an opened domain instance to restrict output to the given
       domain.  On output from fi_getinfo, if no domain was specified, but the user has an  opened  instance  of
       the  named  domain,  this will reference the first opened instance.  If no instance has been opened, this
       field will be NULL.

       The domain instance returned by fi_getinfo should only be considered valid if the  application  does  not
       close any domain instances from another thread while fi_getinfo is being processed.

   Name
       The name of the access domain.

   Multi-threading Support (threading)
       The  threading  model specifies the level of serialization required of an application when using the lib‐
       fabric data transfer interfaces.  Control interfaces are always considered thread safe unless the control
       progress model is FI_PROGRESS_CONTROL_UNIFIED.  A thread safe control interface allows  multiple  threads
       to  progress  the  control  interface, and (depending on threading model selected) one or more threads to
       progress the data interfaces at the same time.  Applications which can guarantee serialization  in  their
       access of provider allocated resources and interfaces enable a provider to eliminate lower-level locks.

       FI_THREAD_COMPLETION
              The  completion threading model is best suited for multi-threaded applications using scalable end‐
              points which desire lockless operation.  Applications must serialize access to  all  objects  that
              are  associated by a common completion mechanism (for example, transmit and receive contexts bound
              to the same CQ or counter).  It is recommended that providers  which  support  scalable  endpoints
              support this threading model.

       Applications  wanting  to  leverage  FI_THREAD_COMPLETION should dedicate transmit contexts, receive con‐
       texts, completion queues, and counters to individual threads.

       FI_THREAD_DOMAIN
              The domain threading model is best suited for single-threaded applications and multi-threaded  ap‐
              plications  using standard endpoints which desire lockless operation.  Applications must serialize
              access to all objects under the same domain.  This includes endpoints, transmit and  receive  con‐
              texts, completion queues and counters, and registered memory regions.

       FI_THREAD_ENDPOINT (deprecated)
              The  endpoint threading model is similar to FI_THREAD_FID, but with the added restriction that se‐
              rialization is required when accessing the same endpoint, even if multiple  transmit  and  receive
              contexts are used.

       FI_THREAD_FID (deprecated)
              A  fabric  descriptor (FID) serialization model requires applications to serialize access to indi‐
              vidual fabric resources associated with data transfer operations and  completions.   For  endpoint
              access,  serialization  is  only  required  when  accessing the same endpoint data flow.  Multiple
              threads may initiate transfers on different transmit contexts or the same endpoint without serial‐
              izing, and no serialization is required between the submission of data transmit requests and  data
              receive operations.

       FI_THREAD_SAFE
              A  thread safe serialization model allows a multi-threaded application to access any allocated re‐
              sources through any  interface  without  restriction.   All  providers  are  required  to  support
              FI_THREAD_SAFE.

       FI_THREAD_UNSPEC
              This  value  indicates that no threading model has been defined.  It may be used on input hints to
              the fi_getinfo call.  When specified, providers will return a threading model that allows for  the
              greatest level of parallelism.

   Progress Models (progress)
       Progress  is  the  ability of the underlying implementation to complete processing of an asynchronous re‐
       quest.  In many cases, the processing of an asynchronous request requires the use of the host  processor.
       For  example,  a  received message may need to be matched with the correct buffer, or a timed out request
       may need to be retransmitted.  For performance reasons, it may be undesirable for the provider  to  allo‐
       cate a thread for this purpose, which will compete with the application threads.

       Control progress indicates the method that the provider uses to make progress on asynchronous control op‐
       erations.  Control operations are functions which do not directly involve the transfer of application da‐
       ta  between  endpoints.  They include address vector, memory registration, and connection management rou‐
       tines.

       Data progress indicates the method that the provider uses to make progress on data  transfer  operations.
       This  includes  message  queue, RMA, tagged messaging, and atomic operations, along with their completion
       processing.

       The progress field defines the behavior of both control and data operations.  For applications  that  re‐
       quire  compilation  portability  between the version 1 and version 2 libfabric series, the progress field
       may be referenced as data_progress.

       Progress frequently requires action being taken at both the transmitting and receiving sides of an opera‐
       tion.  This is often a requirement for reliable transfers, as a result of retry and acknowledgement  pro‐
       cessing.

       To balance between performance and ease of use, the following progress models are defined.

       FI_PROGRESS_AUTO
              This  progress model indicates that the provider will make forward progress on an asynchronous op‐
              eration without further intervention by the application.  When  FI_PROGRESS_AUTO  is  provided  as
              output  to  fi_getinfo  in  the absence of any progress hints, it often indicates that the desired
              functionality is implemented by the provider hardware or is a standard service  of  the  operating
              system.

       It is recommended that providers support FI_PROGRESS_AUTO.  However, if a provider does not natively sup‐
       port  automatic progress, forcing the use of FI_PROGRESS_AUTO may result in threads being allocated below
       the fabric interfaces.

       Note that prior versions of the library required providers to support FI_PROGRESS_AUTO.  However, in some
       cases progress threads cannot be blocked when communication is idle, which results in threads spinning in
       progress functions.  As a result, those providers only supported FI_PROGRESS_MANUAL.

       FI_PROGRESS_MANUAL
              This progress model indicates that the provider requires the use of an application thread to  com‐
              plete  an asynchronous request.  When manual progress is set, the provider will attempt to advance
              an asynchronous operation forward when the application attempts to wait on or read an event queue,
              completion queue, or counter where the completed operation will be reported.  Progress also occurs
              when the application processes a poll or wait set that has been associated with the event or  com‐
              pletion queue.

       Only  wait operations defined by the fabric interface will result in an operation progressing.  Operating
       system or external wait functions, such as select, poll, or pthread routines, cannot.

       Manual progress requirements not only apply to endpoints that initiate transmit operations, but  also  to
       endpoints  that  may  be the target of such operations.  This holds true even if the target endpoint will
       not generate completion events for the operations.  For example, an endpoint that acts purely as the tar‐
       get of RMA or atomic operations that uses manual  progress  may  still  need  application  assistance  to
       process received operations.

       FI_PROGRESS_CONTROL_UNIFIED
              This  progress model indicates that the user will synchronize progressing the data and control op‐
              erations themselves (i.e. this allows the control interface to NOT be thread  safe).   It  implies
              manual  progress,  and  when  combined with threading=FI_THREAD_DOMAIN/FI_THREAD_COMPLETION allows
              Libfabric to remove all locking in the critical data progress path.

       FI_PROGRESS_UNSPEC
              This value indicates that no progress model has been defined.  It may be used on  input  hints  to
              the fi_getinfo call.

   Resource Management (resource_mgmt)
       Resource management (RM) is provider and protocol support to protect against overrunning local and remote
       resources.   This  includes  local and remote transmit contexts, receive contexts, completion queues, and
       source and target data buffers.

       When enabled, applications are given some level of protection against overrunning provider queues and lo‐
       cal and remote data buffers.  Such support may be built directly into the hardware and/or network  proto‐
       col, but may also require that checks be enabled in the provider software.  By disabling resource manage‐
       ment,  an  application  assumes all responsibility for preventing queue and buffer overruns, but doing so
       may allow a provider to eliminate internal synchronization calls, such as atomic variables or locks.

       It should be noted that even if resource management is disabled, the provider implementation and protocol
       may still provide some level of protection against overruns.  However, such protection is not guaranteed.
       The following values for resource management are defined.

       FI_RM_DISABLED
              The provider is free to select an implementation and protocol that does not  protect  against  re‐
              source overruns.  The application is responsible for resource protection.

       FI_RM_ENABLED
              Resource management is enabled for this provider domain.

       FI_RM_UNSPEC
              This  value indicates that no resource management model has been defined.  It may be used on input
              hints to the fi_getinfo call.

       The behavior of the various resource management options depends on whether the endpoint  is  reliable  or
       unreliable,  as  well as provider and protocol specific implementation details, as shown in the following
       table.  The table assumes that all peers enable or disable RM the same.

       Re‐       DGRAM EP-no   DGRAM   EP-    MSG   EP-no   MSG    EP-   RDM   EP-no   RDM    EP-
       source    RM            with RM        RM            with RM      RM            with RM
       ───────────────────────────────────────────────────────────────────────────────────────────
       Tx Ctx     undefined       EAGAIN       undefined      EAGAIN      undefined      EAGAIN
                  error                        error                      error
       Rx Ctx     undefined       EAGAIN       undefined      EAGAIN      undefined      EAGAIN
                  error                        error                      error
        Tx CQ     undefined       EAGAIN       undefined      EAGAIN      undefined      EAGAIN
                  error                        error                      error
        Rx CQ     undefined       EAGAIN       undefined      EAGAIN      undefined      EAGAIN
                  error                        error                      error
       Target      dropped       dropped       transmit      retried      transmit      retried
       EP                                      error                      error
       No  Rx      dropped       dropped       transmit      retried      transmit      retried
       Buffer                                  error                      error
       Rx Buf    truncate or   truncate or    truncate or    truncate    truncate or    truncate
       Over‐     drop          drop           error          or error    error          or error
       run
       Un‐       not applic‐   not applic‐     transmit      transmit     transmit      transmit
       matched   able          able            error         error        error         error
       RMA
       RMA       not applic‐   not applic‐     transmit      transmit     transmit      transmit
       Overrun   able          able            error         error        error         error
       Un‐         dropped       dropped      not applic‐   not    ap‐    transmit      transmit
       reach‐                                 able          plicable      error         error
       able EP

       The resource column indicates the resource being accessed by a data transfer operation.

       Tx Ctx / Rx Ctx
              Refers  to  the transmit/receive contexts when a data transfer operation is submitted.  When RM is
              enabled, attempting to submit a request will fail if the context is full.  If RM is  disabled,  an
              undefined  error  (provider  specific)  will occur.  Such errors should be considered fatal to the
              context, and applications must take steps to avoid queue overruns.

       Tx CQ / Rx CQ
              Refers to the completion queue associated with the Tx or Rx context when a  local  operation  com‐
              pletes.   When RM is disabled, applications must take care to ensure that completion queues do not
              get overrun.  When an overrun occurs, an undefined, but fatal, error will occur affecting all end‐
              points associated with the CQ.  Overruns can be avoided by sizing the CQs appropriately or by  de‐
              ferring the posting of a data transfer operation unless CQ space is available to store its comple‐
              tion.   When  RM  is enabled, providers may use different mechanisms to prevent CQ overruns.  This
              includes failing (returning -FI_EAGAIN) the posting of operations that could result  in  CQ  over‐
              runs,  or  internally retrying requests (which will be hidden from the application).  See notes at
              the end of this section regarding CQ resource management restrictions.

       Target EP / No Rx Buffer
              Target EP refers to resources associated with the endpoint that is the target of a transmit opera‐
              tion.  This includes the target endpoint’s receive queue, posted receive buffers (no Rx  buffers),
              the receive side completion queue, and other related packet processing queues.  The defined behav‐
              ior  is  that  seen  by  the  initiator of a request.  For FI_EP_DGRAM endpoints, if the target EP
              queues are unable to accept incoming messages, received messages will be  dropped.   For  reliable
              endpoints,  if  RM  is  disabled,  the  transmit operation will complete in error.  A provider may
              choose to return an error completion with the error code FI_ENORX for that transmit  operation  so
              that it can be retried.  If RM is enabled, the provider will internally retry the operation.

       Rx Buffer Overrun
              This  refers  to buffers posted to receive incoming tagged or untagged messages, with the behavior
              defined from the viewpoint of the sender.  The behavior for handling received  messages  that  are
              larger  than  the  buffers provided by the application is provider specific.  Providers may either
              truncate the message and report a successful completion, or fail the operation.  For datagram end‐
              points, failed sends will result in the message being dropped.  For reliable endpoints, send oper‐
              ations may complete successfully, yet be truncated at the receive side.  This can occur  when  the
              target  side  buffers received data until an application buffer is made available.  The completion
              status may also be dependent upon the completion model selected byt the  application  (e.g. FI_DE‐
              LIVERY_COMPLETE versus FI_TRANSMIT_COMPLETE).

       Unmatched RMA / RMA Overrun
              Unmatched RMA and RMA overruns deal with the processing of RMA and atomic operations.  Unlike send
              operations,  RMA  operations that attempt to access a memory address that is either not registered
              for such operations, or attempt to access outside of the target memory region will fail, resulting
              in a transmit error.

       Unreachable EP
              Unreachable endpoint is a connectionless specific scenario where transmit operations are issued to
              unreachable target endpoints.  Such scenarios include no-route-to-host or down  target  NIC.   For
              FI_EP_DGRAM  endpoints,  transmit operations targeting an unreachable endpoint will have operation
              dropped.  For FI_EP_RDM, target operations targeting an unreachable  endpoint  will  result  in  a
              transmit error.

       When  a  resource management error occurs on an a connected endpoint, the endpoint will transition into a
       disabled state and the connection torn down.  A disabled endpoint will drop any queued or inflight opera‐
       tions.

       The behavior of resource management errors on connectionless endpoints depends on the type of error.   If
       RM  is  disabled and one of the following errors occur, the endpoint will be disabled: Tx Ctx, Rx Ctx, Tx
       CQ, or Rx CQ.  For other errors (Target EP, No Rx Buffer, etc.), the operation may fail, but the endpoint
       will remain enabled.  A disabled endpoint will drop or fail any queued or inflight operations.  In  addi‐
       tion, a disabled endpoint must be re-enabled before it will accept new data transfer operations.

       There is one notable restriction on the protections offered by resource management.  This occurs when re‐
       source  management  is enabled on an endpoint that has been bound to completion queue(s) using the FI_SE‐
       LECTIVE_COMPLETION flag.  Operations posted to such an endpoint may specify that a successful  completion
       should  not  generate  a  entry  on  the corresponding completion queue.  (I.e.  the operation leaves the
       FI_COMPLETION flag unset).  In such situations, the provider is not required to reserve an entry  in  the
       completion  queue  to handle the case where the operation fails and does generate a CQ entry, which would
       effectively require tracking the operation to completion.  Applications concerned with avoiding CQ  over‐
       runs  in  the  occurrence of errors must ensure that there is sufficient space in the CQ to report failed
       operations.  This can typically be achieved by sizing the CQ to at least the same size  as  the  endpoint
       queue(s) that are bound to it.

   AV Type (av_type)
       Specifies  the  type  of  address vectors that are usable with this domain.  For additional details on AV
       type, see fi_av(3).  The following values may be specified.

       FI_AV_MAP (deprecated)
              Only address vectors of type AV map are requested or supported.

       FI_AV_TABLE
              Only address vectors of type AV index are requested or supported.

       FI_AV_UNSPEC
              Any address vector format is requested and supported.

       Address vectors are only used by connectionless endpoints.  Applications that require the use of  a  spe‐
       cific  type of address vector should set the domain attribute av_type to the necessary value when calling
       fi_getinfo.  The value FI_AV_UNSPEC may be used to indicate that the provider can support either  address
       vector  format.   In this case, a provider may return FI_AV_UNSPEC to indicate that either format is sup‐
       portable, or may return another AV type to indicate the optimal AV type supported by this domain.

   Memory Registration Mode (mr_mode)
       Defines memory registration specific mode bits used with this domain.  Full details on  MR  mode  options
       are available in fi_mr(3).  The following values may be specified.

       FI_MR_ALLOCATED
              Indicates  that memory registration occurs on allocated data buffers, and physical pages must back
              all virtual addresses being registered.

       FI_MR_COLLECTIVE
              Requires data buffers passed to collective operations be explicitly registered for collective  op‐
              erations using the FI_COLLECTIVE flag.

       FI_MR_ENDPOINT
              Memory registration occurs at the endpoint level, rather than domain.

       FI_MR_LOCAL
              The  provider  is  optimized  around having applications register memory for locally accessed data
              buffers.  Data buffers used in send and receive operations and as the source buffer  for  RMA  and
              atomic  operations must be registered by the application for access domains opened with this capa‐
              bility.

       FI_MR_MMU_NOTIFY
              Indicates that the application is responsible for notifying the provider when the page tables ref‐
              erencing a registered memory region may have been updated.

       FI_MR_PROV_KEY
              Memory registration keys are selected and returned by the provider.

       FI_MR_RAW
              The provider requires additional setup as part of their memory registration process.  This mode is
              required by providers that use a memory key that is larger than 64-bits.

       FI_MR_RMA_EVENT
              Indicates that the memory regions associated with completion counters must be  explicitly  enabled
              after being bound to any counter.

       FI_MR_UNSPEC (deprecated)
              Defined for compatibility – library versions 1.4 and earlier.  Setting mr_mode to 0 indicates that
              FI_MR_BASIC or FI_MR_SCALABLE are requested and supported.

       FI_MR_VIRT_ADDR
              Registered memory regions are referenced by peers using the virtual address of the registered mem‐
              ory region, rather than a 0-based offset.

       FI_MR_BASIC (deprecated)
              Defined  for compatibility – library versions 1.4 and earlier.  Only basic memory registration op‐
              erations are requested or supported.  This mode is equivalent to the FI_MR_VIRT_ADDR,  FI_MR_ALLO‐
              CATED, and FI_MR_PROV_KEY flags being set in later library versions.  This flag may not be used in
              conjunction with other mr_mode bits.

       FI_MR_SCALABLE (deprecated)
              Defined  for  compatibility – library versions 1.4 and earlier.  Only scalable memory registration
              operations are requested or supported.  Scalable registration uses offset based  addressing,  with
              application selectable memory keys.  For library versions 1.5 and later, this is the default if no
              mr_mode bits are set.  This flag may not be used in conjunction with other mr_mode bits.

       Buffers  used  in  data transfer operations may require notifying the provider of their use before a data
       transfer can occur.  The mr_mode field indicates the type of memory registration that  is  required,  and
       when registration is necessary.  Applications that require the use of a specific registration mode should
       set  the domain attribute mr_mode to the necessary value when calling fi_getinfo.  The value FI_MR_UNSPEC
       may be used to indicate support for any registration mode.

   MR Key Size (mr_key_size)
       Size of the memory region remote access key, in bytes.  Applications that request their own MR  key  must
       select a value within the range specified by this value.  Key sizes larger than 8 bytes require using the
       FI_RAW_KEY mode bit.

   CQ Data Size (cq_data_size)
       Applications  may include a small message with a data transfer that is placed directly into a remote com‐
       pletion queue as part of a completion event.  This is referred to as remote CQ data  (sometimes  referred
       to as immediate data).  This field indicates the number of bytes that the provider supports for remote CQ
       data.   If  supported  (non-zero  value is returned), the minimum size of remote CQ data must be at least
       4-bytes.

   Completion Queue Count (cq_cnt)
       The optimal number of completion queues supported by the domain, relative to any specified or default  CQ
       attributes.  The cq_cnt value may be a fixed value of the maximum number of CQs supported by the underly‐
       ing  hardware, or may be a dynamic value, based on the default attributes of an allocated CQ, such as the
       CQ size and data format.

   Endpoint Count (ep_cnt)
       The total number of endpoints supported by the domain, relative to any specified or default endpoint  at‐
       tributes.   The ep_cnt value may be a fixed value of the maximum number of endpoints supported by the un‐
       derlying hardware, or may be a dynamic value, based on the default attributes of an  allocated  endpoint,
       such  as  the  endpoint capabilities and size.  The endpoint count is the number of addressable endpoints
       supported by the provider.  Providers return capability limits based on configured hardware maximum capa‐
       bilities.  Providers cannot predict all possible system limitations without posteriori knowledge acquired
       during runtime that will further limit these hardware maximums (e.g. application memory  consumption,  FD
       usage, etc.).

   Transmit Context Count (tx_ctx_cnt)
       The  number  of  outbound  command queues optimally supported by the provider.  For a low-level provider,
       this represents the number of command queues to the hardware and/or the number of parallel  transmit  en‐
       gines  effectively  supported by the hardware and caches.  Applications which allocate more transmit con‐
       texts than this value will end up sharing underlying resources.  By default, there is a  single  transmit
       context associated with each endpoint, but in an advanced usage model, an endpoint may be configured with
       multiple transmit contexts.

   Receive Context Count (rx_ctx_cnt)
       The  number  of inbound processing queues optimally supported by the provider.  For a low-level provider,
       this represents the number hardware queues that can be effectively utilized for processing incoming pack‐
       ets.  Applications which allocate more receive contexts than this value will end  up  sharing  underlying
       resources.  By default, a single receive context is associated with each endpoint, but in an advanced us‐
       age model, an endpoint may be configured with multiple receive contexts.

   Maximum Endpoint Transmit Context (max_ep_tx_ctx)
       The maximum number of transmit contexts that may be associated with an endpoint.

   Maximum Endpoint Receive Context (max_ep_rx_ctx)
       The maximum number of receive contexts that may be associated with an endpoint.

   Maximum Sharing of Transmit Context (max_ep_stx_ctx)
       The maximum number of endpoints that may be associated with a shared transmit context.

   Maximum Sharing of Receive Context (max_ep_srx_ctx)
       The maximum number of endpoints that may be associated with a shared receive context.

   Counter Count (cntr_cnt)
       The optimal number of completion counters supported by the domain.  The cq_cnt value may be a fixed value
       of  the maximum number of counters supported by the underlying hardware, or may be a dynamic value, based
       on the default attributes of the domain.

   MR IOV Limit (mr_iov_limit)
       This is the maximum number of IO vectors (scatter-gather elements) that a single memory registration  op‐
       eration may reference.

   Capabilities (caps)
       Domain  level capabilities.  Domain capabilities indicate domain level features that are supported by the
       provider.

       The following are support primary capabilities: FI_DIRECTED_RECV : When the  domain  is  configured  with
       FI_DIRECTED_RECV and FI_AV_AUTH_KEY, memory regions can be limited to specific authorization keys.

       FI_AV_USER_ID
              Indicates  that  the  domain  supports  the ability to open address vectors with the FI_AV_USER_ID
              flag.  If this domain capability is not set, address vectors cannot be opened with  FI_AV_USER_ID.
              Note that FI_AV_USER_ID can still be supported through the AV insert calls without this domain ca‐
              pability set.  See fi_av(3).

       FI_PEER
              Specifies  that  the  domain must support importing resources to be used in the the peer API flow.
              The domain must support importing owner_ops when opening a CQ, counter, and shared receive queue.

       The following are supported secondary capabilities:

       FI_LOCAL_COMM
              At a conceptual level, this field indicates that the underlying device supports loopback  communi‐
              cation.   More specifically, this field indicates that an endpoint may communicate with other end‐
              points that are allocated from the same underlying named domain.  If this field is not set, an ap‐
              plication may need to use an alternate domain or mechanism  (e.g. shared  memory)  to  communicate
              with peers that execute on the same node.

       FI_REMOTE_COMM
              This  field  indicates  that  the  underlying  provider supports communication with nodes that are
              reachable over the network.  If this field is not set, then the provider only supports  communica‐
              tion between processes that execute on the same node – a shared memory provider, for example.

       FI_SHARED_AV
              Indicates  that  the domain supports the ability to share address vectors among multiple processes
              using the named address vector feature.

       See fi_getinfo(3) for a discussion on primary versus secondary capabilities.

   Default authorization key (auth_key)
       The default authorization key to associate with endpoint and memory registrations created within the  do‐
       main.  This field is ignored unless the fabric is opened with API version 1.5 or greater.

       If domain auth_key_size is set to the value FI_AV_AUTH_KEY, auth_key must be NULL.

   Default authorization key length (auth_key_size)
       The  length in bytes of the default authorization key for the domain.  If set to 0, then no authorization
       key will be associated with endpoints and memory registrations created within the domain unless specified
       in the endpoint or memory registration attributes.  This field is ignored unless  the  fabric  is  opened
       with API version 1.5 or greater.

       If  the  size  is set to the value FI_AV_AUTH_KEY, all endpoints and memory regions will be configured to
       use authorization keys associated with the AV.  Providers which support authorization  keys  and  connec‐
       tionless endpoint must support this option.

   Max Error Data Size (max_err_data)
       The  maximum  amount of error data, in bytes, that may be returned as part of a completion or event queue
       error.  This  value  corresponds  to  the  err_data_size  field  in  struct  fi_cq_err_entry  and  struct
       fi_eq_err_entry.

   Memory Regions Count (mr_cnt)
       The  optimal  number of memory regions supported by the domain, or endpoint if the mr_mode FI_MR_ENDPOINT
       bit has been set.  The mr_cnt value may be a fixed value of the maximum number of MRs  supported  by  the
       underlying  hardware,  or  may be a dynamic value, based on the default attributes of the domain, such as
       the supported memory registration modes.  Applications can set the mr_cnt on input to fi_getinfo, in  or‐
       der  to indicate their memory registration requirements.  Doing so may allow the provider to optimize any
       memory registration cache or lookup tables.

   Traffic Class (tclass)
       This specifies the default traffic class that will be associated any endpoints created within the domain.
       See fi_endpoint(3) for additional information.

   Max Authorization Keys per Endpoint (max_ep_auth_key)
       The maximum number of authorization keys which can be supported per connectionless endpoint.

   Maximum Peer Group Id (max_group_id)
       The maximum value that a peer group may be assigned, inclusive.  Valid peer group id’s must be between  0
       and  max_group_id.   See fi_av(3) for additional information on peer groups and their use.  Users may re‐
       quest support for peer groups by setting this to a non-zero value.  Providers that cannot  meet  the  re‐
       quested  max_group_id  will  fail fi_getinfo().  On output, providers may return a value higher than that
       requested by the application.

RETURN VALUE

       Returns 0 on success.  On error, a negative value corresponding to fabric errno is returned.  Fabric  er‐
       rno values are defined in rdma/fi_errno.h.

NOTES

       Users should call fi_close to release all resources allocated to the fabric domain.

       The  following fabric resources are associated with domains: active endpoints, memory regions, completion
       event queues, and address vectors.

       Domain attributes reflect the limitations and capabilities of the  underlying  hardware  and/or  software
       provider.   They do not reflect system limitations, such as the number of physical pages that an applica‐
       tion may pin or number of file descriptors that the application may open.  As a result, the reported max‐
       imums may not be achievable, even on a lightly loaded systems, without an administrator configuring  sys‐
       tem resources appropriately for the installed provider(s).

SEE ALSO

       fi_getinfo(3), fi_endpoint(3), fi_av(3), fi_eq(3), fi_mr(3) fi_peer(3)

AUTHORS

       OpenFabrics.

Libfabric Programmer’s Manual                      2025-03-07                                       fi_domain(3)