Provided by: libfabric-dev_1.6.2-3ubuntu0.1_amd64 bug

NAME

       fi_endpoint - Fabric endpoint operations

       fi_endpoint / fi_scalable_ep / fi_passive_ep / fi_close
              Allocate or close an endpoint.

       fi_ep_bind
              Associate  an  endpoint  with  hardware resources, such as event queues, completion
              queues, counters, address vectors, or shared transmit/receive contexts.

       fi_scalable_ep_bind
              Associate a scalable endpoint with an address vector

       fi_pep_bind
              Associate a passive endpoint with an event queue

       fi_enable
              Transitions an active endpoint into an enabled state.

       fi_cancel
              Cancel a pending asynchronous data transfer

       fi_ep_alias
              Create an alias to the endpoint

       fi_control
              Control endpoint operation.

       fi_getopt / fi_setopt
              Get or set endpoint options.

       fi_rx_context / fi_tx_context / fi_srx_context / fi_stx_context
              Open a transmit or receive context.

       fi_rx_size_left / fi_tx_size_left (DEPRECATED)
              Query the lower bound on how  many  RX/TX  operations  may  be  posted  without  an
              operation  returning  -FI_EAGAIN.   This functions have been deprecated and will be
              removed in a future version of the library.

SYNOPSIS

              #include <rdma/fabric.h>

              #include <rdma/fi_endpoint.h>

              int fi_endpoint(struct fid_domain *domain, struct fi_info *info,
                  struct fid_ep **ep, void *context);

              int fi_scalable_ep(struct fid_domain *domain, struct fi_info *info,
                  struct fid_ep **sep, void *context);

              int fi_passive_ep(struct fi_fabric *fabric, struct fi_info *info,
                  struct fid_pep **pep, void *context);

              int fi_tx_context(struct fid_ep *sep, int index,
                  struct fi_tx_attr *attr, struct fid_ep **tx_ep,
                  void *context);

              int fi_rx_context(struct fid_ep *sep, int index,
                  struct fi_rx_attr *attr, struct fid_ep **rx_ep,
                  void *context);

              int fi_stx_context(struct fid_domain *domain,
                  struct fi_tx_attr *attr, struct fid_stx **stx,
                  void *context);

              int fi_srx_context(struct fid_domain *domain,
                  struct fi_rx_attr *attr, struct fid_ep **rx_ep,
                  void *context);

              int fi_close(struct fid *ep);

              int fi_ep_bind(struct fid_ep *ep, struct fid *fid, uint64_t flags);

              int fi_scalable_ep_bind(struct fid_ep *sep, struct fid *fid, uint64_t flags);

              int fi_pep_bind(struct fid_pep *pep, struct fid *fid, uint64_t flags);

              int fi_enable(struct fid_ep *ep);

              int fi_cancel(struct fid_ep *ep, void *context);

              int fi_ep_alias(struct fid_ep *ep, struct fid_ep **alias_ep, uint64_t flags);

              int fi_control(struct fid *ep, int command, void *arg);

              int fi_getopt(struct fid *ep, int level, int optname,
                  void *optval, size_t *optlen);

              int fi_setopt(struct fid *ep, int level, int optname,
                  const void *optval, size_t optlen);

              DEPRECATED ssize_t fi_rx_size_left(struct fid_ep *ep);

              DEPRECATED ssize_t fi_tx_size_left(struct fid_ep *ep);

ARGUMENTS

       fid : On creation, specifies a fabric or access domain.  On  bind,  identifies  the  event
       queue,  completion  queue,  counter,  or address vector to bind to the endpoint.  In other
       cases, it's a fabric identifier of an associated resource.

       info : Details about the fabric interface endpoint to be opened, obtained from fi_getinfo.

       ep : A fabric endpoint.

       sep : A scalable fabric endpoint.

       pep : A passive fabric endpoint.

       context : Context associated with the endpoint or asynchronous operation.

       index : Index to retrieve a specific transmit/receive context.

       attr : Transmit or receive context attributes.

       flags : Additional flags to apply to the operation.

       command : Command of control operation to perform on endpoint.

       arg : Optional control argument.

       level : Protocol level at which the desired option resides.

       optname : The protocol option to read or set.

       optval : The option value that was read or to set.

       optlen : The size of the optval buffer.

DESCRIPTION

       Endpoints are transport level communication portals.  There are two  types  of  endpoints:
       active  and  passive.  Passive endpoints belong to a fabric domain and are most often used
       to listen for incoming connection requests.  However, a passive endpoint may  be  used  to
       reserve  a  fabric  address  that  can be granted to an active endpoint.  Active endpoints
       belong to access domains and can perform data transfers.

       Active endpoints may be  connection-oriented  or  connectionless,  and  may  provide  data
       reliability.    The  data  transfer  interfaces  --  messages  (fi_msg),  tagged  messages
       (fi_tagged),  RMA  (fi_rma),  and  atomics  (fi_atomic)  --  are  associated  with  active
       endpoints.   In  basic configurations, an active endpoint has transmit and receive queues.
       In general, operations that generate traffic on the fabric  are  posted  to  the  transmit
       queue.   This  includes  all  RMA and atomic operations, along with sent messages and sent
       tagged messages.  Operations that post buffers for receiving incoming data  are  submitted
       to the receive queue.

       Active  endpoints are created in the disabled state.  They must transition into an enabled
       state before accepting data transfer operations, including  posting  of  receive  buffers.
       The  fi_enable  call  is used to transition an active endpoint into an enabled state.  The
       fi_connect and fi_accept calls will also transition an endpoint into the enabled state, if
       it is not already active.

       In  order to transition an endpoint into an enabled state, it must be bound to one or more
       fabric resources.  An endpoint that will generate asynchronous completions, either through
       data  transfer  operations  or  communication  establishment  events, must be bound to the
       appropriate completion  queues  or  event  queues,  respectively,  before  being  enabled.
       Additionally,  endpoints  that  use  manual  progress  must  be  associated  with relevant
       completion queues or event queues in order to drive progress.  For endpoints that are only
       used  as  the  target  of  RMA  or atomic operations, this means binding the endpoint to a
       completion queue associated with receive processing.  Unconnected endpoints must be  bound
       to an address vector.

       Once an endpoint has been activated, it may be associated with an address vector.  Receive
       buffers may be posted to it and calls may be made to  connection  establishment  routines.
       Connectionless endpoints may also perform data transfers.

       The  behavior  of  an  endpoint  may  be adjusted by setting its control data and protocol
       options.   This  allows  the  underlying  provider   to   redirect   function   calls   to
       implementations optimized to meet the desired application behavior.

       If  an  endpoint  experiences  a  critical  error, it will transition back into a disabled
       state.  Critical errors are reported through the event queue associated with the  EP.   In
       certain cases, a disabled endpoint may be re-enabled.  The ability to transition back into
       an enabled state is provider specific and depends on the type of error that  the  endpoint
       experienced.   When  an  endpoint is disabled as a result of a critical error, all pending
       operations are discarded.

   fi_endpoint / fi_passive_ep / fi_scalable_ep
       fi_endpoint allocates a new  active  endpoint.   fi_passive_ep  allocates  a  new  passive
       endpoint.   fi_scalable_ep  allocates a scalable endpoint.  The properties and behavior of
       the endpoint are defined based  on  the  provided  struct  fi_info.   See  fi_getinfo  for
       additional  details  on  fi_info.  fi_info flags that control the operation of an endpoint
       are defined below.  See section SCALABLE ENDPOINTS.

       If an active endpoint is allocated in order to accept a connection  request,  the  fi_info
       parameter  must  be the same as the fi_info structure provided with the connection request
       (FI_CONNREQ) event.

       An active endpoint may acquire the properties of a passive endpoint by setting the fi_info
       handle  field  to the passive endpoint fabric descriptor.  This is useful for applications
       that need to reserve the fabric address of an endpoint prior to knowing  if  the  endpoint
       will  be used on the active or passive side of a connection.  For example, this feature is
       useful for simulating socket semantics.  Once an active endpoint acquires  the  properties
       of a passive endpoint, the passive endpoint is no longer bound to any fabric resources and
       must no longer be used.  The user is expected to close the passive endpoint after  opening
       the active endpoint in order to free up any lingering resources that had been used.

   fi_close
       Closes an endpoint and release all resources associated with it.

       When  closing  a  scalable endpoint, there must be no opened transmit contexts, or receive
       contexts associated with the scalable endpoint.  If resources are  still  associated  with
       the scalable endpoint when attempting to close, the call will return -FI_EBUSY.

       Outstanding  operations  posted to the endpoint when fi_close is called will be discarded.
       Discarded  operations  will  silently  be   dropped,   with   no   completions   reported.
       Additionally,  a  provider may discard previously completed operations from the associated
       completion queue(s).  The behavior to discard completed operations is provider specific.

   fi_ep_bind
       fi_ep_bind is used to associate an endpoint with hardware resources.  The  common  use  of
       fi_ep_bind  is  to  direct  asynchronous  operations  associated  with  an  endpoint  to a
       completion queue.  An endpoint must be bound with CQs capable of reporting completions for
       any  asynchronous  operation  initiated  on the endpoint.  This is true even for endpoints
       which are configured to suppress successful completions, in  order  that  operations  that
       complete  in  error  may  be  reported  to the user.  For passive endpoints, this requires
       binding the endpoint with an EQ that supports the communication management (CM) domain.

       An active endpoint may direct asynchronous completions to different CQs, based on the type
       of  operation.  This is specified using fi_ep_bind flags.  The following flags may be used
       separately or OR'ed together when binding an endpoint to a completion domain CQ.

       FI_TRANSMIT : Directs the completion of outbound data transfer requests to  the  specified
       completion queue.  This includes send message, RMA, and atomic operations.

       FI_RECV  :  Directs the notification of inbound data transfers to the specified completion
       queue.   This  includes  received   messages.    This   binding   automatically   includes
       FI_REMOTE_WRITE, if applicable to the endpoint.

       FI_SELECTIVE_COMPLETION : By default, data transfer operations generate completion entries
       into a completion queue after they have successfully completed.  Applications can use this
       bind    flag    to    selectively    enable    when   completions   are   generated.    If
       FI_SELECTIVE_COMPLETION is specified, data transfer operations will not  generate  entries
       for  successful  completions  unless  FI_COMPLETION  is set as an operational flag for the
       given operation.  FI_SELECTIVE_COMPLETION must be OR'ed with  FI_TRANSMIT  and/or  FI_RECV
       flags.

       When  FI_SELECTIVE_COMPLETION is set, the user must determine when a request that does NOT
       have FI_COMPLETION set has completed indirectly, usually based  on  the  completion  of  a
       subsequent  operation.   Use of this flag may improve performance by allowing the provider
       to avoid writing a completion entry for every operation.

       Example: An application can selectively generate send completions by using  the  following
       general approach:

                fi_tx_attr::op_flags = 0; // default - no completion
                fi_ep_bind(ep, cq, FI_TRANSMIT | FI_SELECTIVE_COMPLETION);
                fi_send(ep, ...);                   // no completion
                fi_sendv(ep, ...);                  // no completion
                fi_sendmsg(ep, ..., FI_COMPLETION); // completion!
                fi_inject(ep, ...);                 // no completion

       Example:  An  application  can  selectively  disable  send  completions  by  modifying the
       operational flags:

                fi_tx_attr::op_flags = FI_COMPLETION; // default - completion
                fi_ep_bind(ep, cq, FI_TRANSMIT | FI_SELECTIVE_COMPLETION);
                fi_send(ep, ...);       // completion
                fi_sendv(ep, ...);      // completion
                fi_sendmsg(ep, ..., 0); // no completion!
                fi_inject(ep, ...);     // no completion!

       Example: Omitting FI_SELECTIVE_COMPLETION when binding will generate completions  for  all
       non-fi_inject calls:

                fi_tx_attr::op_flags = 0;
                fi_ep_bind(ep, cq, FI_TRANSMIT);    // default - completion
                fi_send(ep, ...);                   // completion
                fi_sendv(ep, ...);                  // completion
                fi_sendmsg(ep, ..., 0);             // completion!
                fi_sendmsg(ep, ..., FI_COMPLETION); // completion
                fi_sendmsg(ep, ..., FI_INJECT|FI_COMPLETION); // completion!
                fi_inject(ep, ...);                 // no completion!

       An endpoint may also be bound to a fabric counter.  When binding an endpoint to a counter,
       the following flags may be specified.

       FI_SEND : Increments the specified counter whenever a message transfer initiated over  the
       endpoint  has  completed  successfully or in error.  Sent messages include both tagged and
       normal message operations.

       FI_RECV : Increments the specified  counter  whenever  a  message  is  received  over  the
       endpoint.  Received messages include both tagged and normal message operations.

       FI_READ  : Increments the specified counter whenever an RMA read or atomic fetch operation
       initiated from the endpoint has completed successfully or in error.

       FI_WRITE : Increments the specified counter whenever an  RMA  write  or  atomic  operation
       initiated from the endpoint has completed successfully or in error.

       FI_REMOTE_READ  :  Increments  the  specified counter whenever an RMA read or atomic fetch
       operation is initiated from a remote endpoint that targets the  given  endpoint.   Use  of
       this flag requires that the endpoint be created using FI_RMA_EVENT.

       FI_REMOTE_WRITE  :  Increments  the  specified  counter  whenever  an  RMA write or atomic
       operation is initiated from a remote endpoint that targets the  given  endpoint.   Use  of
       this flag requires that the endpoint be created using FI_RMA_EVENT.

       An  endpoint  may  only  be bound to a single CQ or counter for a given type of operation.
       For example, a EP may  not  bind  to  two  counters  both  using  FI_WRITE.   Furthermore,
       providers may limit CQ and counter bindings to endpoints of the same endpoint type (DGRAM,
       MSG, RDM, etc.).

       Connectionless endpoints must be bound to a single address vector.

       If an endpoint is using a shared transmit and/or receive context, the shared contexts must
       be  bound  to  the  endpoint.   CQs,  counters,  AV,  and shared contexts must be bound to
       endpoints before they are enabled.

   fi_scalable_ep_bind
       fi_scalable_ep_bind is used to associate a scalable endpoint with an address vector.   See
       section  on  SCALABLE ENDPOINTS.  A scalable endpoint has a single transport level address
       and can support multiple transmit and receive contexts.  The transmit and receive contexts
       share  the  transport-level address.  Address vectors that are bound to scalable endpoints
       are implicitly bound to any transmit  or  receive  contexts  created  using  the  scalable
       endpoint.

   fi_enable
       This  call  transitions  the  endpoint into an enabled state.  An endpoint must be enabled
       before it may be used to perform data transfers.  Enabling an endpoint  typically  results
       in  hardware  resources  being assigned to it.  Endpoints making use of completion queues,
       counters, event queues, and/or address vectors must be bound to them before being enabled.

       Calling connect or accept on an endpoint will implicitly enable an endpoint if it has  not
       already been enabled.

       fi_enable  may also be used to re-enable an endpoint that has been disabled as a result of
       experiencing a critical error.  Applications should check the return value from  fi_enable
       to see if a disabled endpoint has successfully be re-enabled.

   fi_cancel
       fi_cancel  attempts  to  cancel  an  outstanding  asynchronous  operation.   Canceling  an
       operation causes the fabric provider to search for the  operation  and,  if  it  is  still
       pending,  complete  it as having been canceled.  An error queue entry will be available in
       the associated error queue with error code  FI_ECANCELED.   On  the  other  hand,  if  the
       operation  completed  before  the  call  to  fi_cancel, then the completion status of that
       operation will be available in the associated completion queue.  No specific entry related
       to fi_cancel itself will be posted.

       Cancel  uses the context parameter associated with an operation to identify the request to
       cancel.  Operations posted  without  a  valid  context  parameter  --  either  no  context
       parameter  is  specified  or  the  context  value was ignored by the provider -- cannot be
       canceled.  If multiple outstanding operations match the context parameter, only  one  will
       be  canceled.   In  this  case, the operation which is canceled is provider specific.  The
       cancel operation is asynchronous, but will complete within a bounded period of time.

   fi_ep_alias
       This call creates an alias to the specified endpoint.   Conceptually,  an  endpoint  alias
       provides  an  alternate  software  path  from  the  application to the underlying provider
       hardware.  An alias EP differs from its parent endpoint only by its default data  transfer
       flags.  For example, an alias EP may be configured to use a different completion mode.  By
       default, an alias EP inherits the same data transfer flags as  the  parent  endpoint.   An
       application can use fi_control to modify the alias EP operational flags.

       When  allocating  an  alias,  an  application may configure either the transmit or receive
       operational flags.  This avoids needing a separate call to fi_control to set those  flags.
       The  flags passed to fi_ep_alias must include FI_TRANSMIT or FI_RECV (not both) with other
       operational  flags  OR'ed  in.   This  will  override  the  transmit  or  receive   flags,
       respectively,  for  operations  posted  through the alias endpoint.  All allocated aliases
       must be closed for the underlying endpoint to be released.

   fi_control
       The control operation is used to adjust the default behavior of an  endpoint.   It  allows
       the  underlying  provider  to redirect function calls to implementations optimized to meet
       the desired application behavior.  As a result, calls to fi_ep_control must be  serialized
       against all other calls to an endpoint.

       The  base  operation of an endpoint is selected during creation using struct fi_info.  The
       following control commands and arguments may be assigned to an endpoint.

       **FI_GETOPSFLAG -- uint64_t *flags** :  Used  to  retrieve  the  current  value  of  flags
       associated  with  the  data  transfer  operations  initiated on the endpoint.  The control
       argument must include FI_TRANSMIT or FI_RECV (not both) flags to indicate the type of data
       transfer flags to be returned.  See below for a list of control flags.

       **FI_SETOPSFLAG  --  uint64_t  *flags** : Used to change the data transfer operation flags
       associated with an endpoint.  The control argument must  include  FI_TRANSMIT  or  FI_RECV
       (not  both)  to  indicate  the  type of data transfer that the flags should apply to, with
       other flags OR'ed in.  The given flags will override the  previous  transmit  and  receive
       attributes  that  were set when the endpoint was created.  Valid control flags are defined
       below.

       **FI_BACKLOG - int *value** : This option only applies to passive endpoints.  It  is  used
       to set the connection request backlog for listening endpoints.

       FI_GETWAIT  (void  **)  :  This  command  allows  the user to retrieve the file descriptor
       associated with a socket endpoint.  The fi_control arg  parameter  should  be  an  address
       where a pointer to the returned file descriptor will be written.  See fi_eq.3 for addition
       details  using  fi_control  with  FI_GETWAIT.   The  file  descriptor  may  be  used   for
       notification that the endpoint is ready to send or receive data.

   fi_getopt / fi_setopt
       Endpoint  protocol  operations  may  be  retrieved using fi_getopt or set using fi_setopt.
       Applications specify the level that a desired option  exists,  identify  the  option,  and
       provide  input/output buffers to get or set the option.  fi_setopt provides an application
       a way to adjust low-level protocol and implementation specific details of an endpoint.

       The following option levels and option names and parameters are defined.

       FI_OPT_ENDPOINTFI_OPT_MIN_MULTI_RECV - size_t : Defines the minimum receive buffer space available when
         the  receive  buffer  is  released  by the provider (see FI_MULTI_RECV).  Modifying this
         value is only guaranteed to set the minimum buffer space needed on receives posted after
         the  value  has been changed.  It is recommended that applications that want to override
         the default MIN_MULTI_RECV value set  this  option  before  enabling  the  corresponding
         endpoint.

       • FI_OPT_CM_DATA_SIZE  -  size_t  : Defines the size of available space in CM messages for
         user-defined data.  This value limits the amount of data that applications can  exchange
         between  peer  endpoints using the fi_connect, fi_accept, and fi_reject operations.  The
         size returned is dependent upon the properties of the endpoint, except in  the  case  of
         passive  endpoints,  in which the size reflects the maximum size of the data that may be
         present as part of a connection request event.  This option is read only.

   fi_rx_size_left (DEPRECATED)
       This function has been deprecated and will be removed in a future version of the  library.
       It may not be supported by all providers.

       The  fi_rx_size_left  call  returns a lower bound on the number of receive operations that
       may be  posted  to  the  given  endpoint  without  that  operation  returning  -FI_EAGAIN.
       Depending  on  the  specific  details of the subsequently posted receive operations (e.g.,
       number of iov entries, which receive function is called, etc.), it may be possible to post
       more receive operations than originally indicated by fi_rx_size_left.

   fi_tx_size_left (DEPRECATED)
       This  function has been deprecated and will be removed in a future version of the library.
       It may not be supported by all providers.

       The fi_tx_size_left call returns a lower bound on the number of transmit  operations  that
       may  be  posted  to  the  given  endpoint  without  that  operation  returning -FI_EAGAIN.
       Depending on the specific details of the subsequently posted  transmit  operations  (e.g.,
       number  of  iov  entries,  which transmit function is called, etc.), it may be possible to
       post more transmit operations than originally indicated by fi_tx_size_left.

ENDPOINT ATTRIBUTES

       The fi_ep_attr structure defines the  set  of  attributes  associated  with  an  endpoint.
       Endpoint  attributes  may  be  further  refined  using  the  transmit  and receive context
       attributes as shown below.

              struct fi_ep_attr {
                  enum fi_ep_type type;
                  uint32_t        protocol;
                  uint32_t        protocol_version;
                  size_t          max_msg_size;
                  size_t          msg_prefix_size;
                  size_t          max_order_raw_size;
                  size_t          max_order_war_size;
                  size_t          max_order_waw_size;
                  uint64_t        mem_tag_format;
                  size_t          tx_ctx_cnt;
                  size_t          rx_ctx_cnt;
                  size_t          auth_key_size;
                  uint8_t         *auth_key;
              };

   type - Endpoint Type
       If specified, indicates the type of fabric  interface  communication  desired.   Supported
       types are:

       FI_EP_UNSPEC  : The type of endpoint is not specified.  This is usually provided as input,
       with other attributes of the endpoint or the provider selecting the type.

       FI_EP_MSG : Provides a reliable,  connection-oriented  data  transfer  service  with  flow
       control that maintains message boundaries.

       FI_EP_DGRAM  :  Supports  a  connectionless,  unreliable  datagram communication.  Message
       boundaries are maintained, but the maximum message size may be limited to the fabric  MTU.
       Flow control is not guaranteed.

       FI_EP_RDM  :  Reliable  datagram  message.  Provides a reliable, unconnected data transfer
       service with flow control that maintains message boundaries.

       FI_EP_SOCK_STREAM : Data streaming endpoint with TCP socket-like  semantics.   Provides  a
       reliable,  connection-oriented  data  transfer  service  that  does  not  maintain message
       boundaries.  FI_EP_SOCK_STREAM is most useful for applications designed around  using  TCP
       sockets.   See  the  SOCKET  ENDPOINT section for additional details and restrictions that
       apply to stream endpoints.

       FI_EP_SOCK_DGRAM : A connectionless, unreliable datagram  endpoint  with  UDP  socket-like
       semantics.   FI_EP_SOCK_DGRAM  is  most  useful for applications designed around using UDP
       sockets.  See the SOCKET ENDPOINT section for additional  details  and  restrictions  that
       apply to datagram socket endpoints.

   Protocol
       Specifies the low-level end to end protocol employed by the provider.  A matching protocol
       must be used  by  communicating  endpoints  to  ensure  interoperability.   The  following
       protocol  values  are  defined.   Provider  specific protocols are also allowed.  Provider
       specific protocols will be indicated by having the upper bit of the protocol value set  to
       one.

       FI_PROTO_UNSPEC  : The protocol is not specified.  This is usually provided as input, with
       other attributes of the socket or the provider selecting the actual protocol.

       FI_PROTO_RDMA_CM_IB_RC : The protocol runs over Infiniband reliable-connected queue pairs,
       using the RDMA CM protocol for connection establishment.

       FI_PROTO_IWARP : The protocol runs over the Internet wide area RDMA protocol transport.

       FI_PROTO_IB_UD : The protocol runs over Infiniband unreliable datagram queue pairs.

       FI_PROTO_PSMX  :  The  protocol  is  based  on an Intel proprietary protocol known as PSM,
       performance scaled messaging.  PSMX is an extended version of the PSM protocol to  support
       the libfabric interfaces.

       FI_PROTO_UDP  :  The  protocol sends and receives UDP datagrams.  For example, an endpoint
       using FI_PROTO_UDP will be able to communicate with a remote peer that is  using  Berkeley
       SOCK_DGRAM sockets using IPPROTO_UDP.

       FI_PROTO_SOCK_TCP : The protocol is layered over TCP packets.

       FI_PROTO_IWARP_RDM  : Reliable-datagram protocol implemented over iWarp reliable-connected
       queue pairs.

       FI_PROTO_IB_RDM    :    Reliable-datagram    protocol    implemented    over    InfiniBand
       reliable-connected queue pairs.

       FI_PROTO_GNI : Protocol runs over Cray GNI low-level interface.

       FI_PROTO_RXM  :  Reliable-datagram  protocol implemented over message endpoints.  RXM is a
       libfabric utility component that adds RDM endpoint semantics over MSG endpoint semantics.

       FI_PROTO_RXD : Reliable-datagram protocol implemented over datagram endpoints.  RXD  is  a
       libfabric  utility  component  that  adds  RDM  endpoint  semantics  over  DGRAM  endpoint
       semantics.

       FI_PROTO_NETWORKDIRECT : Protocol  runs  over  Microsoft  NetworkDirect  service  provider
       interface.   This  adds  reliable-datagram  semantics  over  the NetworkDirect connection-
       oriented endpoint semantics.

       FI_PROTO_PSMX2 : The protocol is based on an Intel proprietary  protocol  known  as  PSM2,
       performance scaled messaging version 2.  PSMX2 is an extended version of the PSM2 protocol
       to support the libfabric interfaces.

   protocol_version - Protocol Version
       Identifies which version of the protocol  is  employed  by  the  provider.   The  protocol
       version  allows providers to extend an existing protocol, by adding support for additional
       features or functionality for example, in a backward compatible  manner.   Providers  that
       support  different versions of the same protocol should inter-operate, but only when using
       the capabilities defined for the lesser version.

   max_msg_size - Max Message Size
       Defines the maximum size for an application data transfer as a single operation.

   msg_prefix_size - Message Prefix Size
       Specifies the size of any required message prefix buffer space.   This  field  will  be  0
       unless  the  FI_MSG_PREFIX mode is enabled.  If msg_prefix_size is > 0 the specified value
       will be a multiple of 8-bytes.

   Max RMA Ordered Size
       The maximum ordered size specifies the delivery order of transport data into target memory
       for  RMA  and  atomic  operations.   Data  ordering  is separate, but dependent on message
       ordering (defined below).  Data  ordering  is  unspecified  where  message  order  is  not
       defined.

       Data  ordering  refers to the access of target memory by subsequent operations.  When back
       to back RMA read or write operations access the  same  registered  memory  location,  data
       ordering  indicates  whether  the second operation reads or writes the target memory after
       the first operation has completed.  Because RMA ordering applies between  two  operations,
       and  not  within  a  single data transfer, ordering is defined per byte-addressable memory
       location.  I.e.  ordering specifies whether location X is accessed by the second operation
       after the first operation.  Nothing is implied about the completion of the first operation
       before the second operation is initiated.

       In order to support large data transfers being broken into multiple packets and sent using
       multiple paths through the fabric, data ordering may be limited to transfers of a specific
       size or less.  Providers specify when data ordering is maintained  through  the  following
       values.  Note that even if data ordering is not maintained, message ordering may be.

       max_order_raw_size  :  Read  after  write  size.   If set, an RMA or atomic read operation
       issued after an RMA or atomic write operation, both of which are smaller  than  the  size,
       will  be  ordered.   Where  the  target  memory  locations overlap, the RMA or atomic read
       operation will see the results of the previous RMA or atomic write.

       max_order_war_size : Write after read size.  If set, an  RMA  or  atomic  write  operation
       issued  after  an  RMA  or atomic read operation, both of which are smaller than the size,
       will be ordered.  The RMA or atomic read operation will  see  the  initial  value  of  the
       target memory location before a subsequent RMA or atomic write updates the value.

       max_order_waw_size  :  Write  after  write size.  If set, an RMA or atomic write operation
       issued after an RMA or atomic write operation, both of which are smaller  than  the  size,
       will be ordered.  The target memory location will reflect the results of the second RMA or
       atomic write.

       An order size value of 0 indicates that  ordering  is  not  guaranteed.   A  value  of  -1
       guarantees ordering for any data size.

   mem_tag_format - Memory Tag Format
       The memory tag format is a bit array used to convey the number of tagged bits supported by
       a provider.  Additionally, it may be used to divide the bit array  into  separate  fields.
       The mem_tag_format optionally begins with a series of bits set to 0, to signify bits which
       are ignored by the provider.  Following the initial prefix of ignored bits, the array will
       consist  of  alternating  groups  of  bits  set to all 1's or all 0's.  Each group of bits
       corresponds to a tagged field.  The implication of defining a tagged field is that when  a
       mask  is applied to the tagged bit array, all bits belonging to a single field will either
       be set to 1 or 0, collectively.

       For example, a mem_tag_format of 0x30FF indicates support for 14  tagged  bits,  separated
       into 3 fields.  The first field consists of 2-bits, the second field 4-bits, and the final
       field 8-bits.  Valid masks for such a tagged field would be a bitwise OR'ing  of  zero  or
       more of the following values: 0x3000, 0x0F00, and 0x00FF.

       By  identifying  fields  within  a  tag,  a  provider may be able to optimize their search
       routines.  An application which requests tag fields must provide tag masks that either set
       all mask bits corresponding to a field to all 0 or all 1.  When negotiating tag fields, an
       application can request a specific number of fields of a  given  size.   A  provider  must
       return a tag format that supports the requested number of fields, with each field being at
       least the size requested, or fail the request.  A provider may increase the  size  of  the
       fields.   When  reporting completions (see FI_CQ_FORMAT_TAGGED), the provider must provide
       the exact value of the received tag, clearing out any unsupported tag bits.

       It is recommended that field sizes be  ordered  from  smallest  to  largest.   A  generic,
       unstructured  tag  and  mask  can  be  achieved  by  requesting  a bit array consisting of
       alternating 1's and 0's.

   tx_ctx_cnt - Transmit Context Count
       Number of transmit contexts to associate with the  endpoint.   If  not  specified  (0),  1
       context  will  be assigned if the endpoint supports outbound transfers.  Transmit contexts
       are independent transmit queues that may be separately configured.  Each transmit  context
       may be bound to a separate CQ, and no ordering is defined between contexts.  Additionally,
       no synchronization is needed when accessing contexts in parallel.

       If the count is set to the value FI_SHARED_CONTEXT, the endpoint will be configured to use
       a  shared  transmit  context, if supported by the provider.  Providers that do not support
       shared transmit contexts will fail the request.

       See the scalable endpoint and shared contexts sections for additional details.

   rx_ctx_cnt - Receive Context Count
       Number of receive contexts to associate with the endpoint.  If not  specified,  1  context
       will  be  assigned  if  the  endpoint  supports  inbound  transfers.  Receive contexts are
       independent processing queues that may be separately configured.  Each receive context may
       be  bound to a separate CQ, and no ordering is defined between contexts.  Additionally, no
       synchronization is needed when accessing contexts in parallel.

       If the count is set to the value FI_SHARED_CONTEXT, the endpoint will be configured to use
       a  shared  receive  context,  if supported by the provider.  Providers that do not support
       shared receive contexts will fail the request.

       See the scalable endpoint and shared contexts sections for additional details.

   auth_key_size - Authorization Key Length
       The length of the authorization key in bytes.  This field will be 0 if authorization  keys
       are  not  available  or  used.  This field is ignored unless the fabric is opened with API
       version 1.5 or greater.

   auth_key - Authorization Key
       If supported by the fabric, an authorization key (a.k.a.  job key) to associate  with  the
       endpoint.   An  authorization  key is used to limit communication between endpoints.  Only
       peer endpoints that are programmed to use the  same  authorization  key  may  communicate.
       Authorization  keys are often used to implement job keys, to ensure that processes running
       in different jobs do not accidentally cross traffic.  The domain authorization key will be
       used if auth_key_size is set to 0.  This field is ignored unless the fabric is opened with
       API version 1.5 or greater.

TRANSMIT CONTEXT ATTRIBUTES

       Attributes specific to the transmit capabilities of an endpoint are specified using struct
       fi_tx_attr.

              struct fi_tx_attr {
                  uint64_t  caps;
                  uint64_t  mode;
                  uint64_t  op_flags;
                  uint64_t  msg_order;
                  uint64_t  comp_order;
                  size_t    inject_size;
                  size_t    size;
                  size_t    iov_limit;
                  size_t    rma_iov_limit;
              };

   caps - Capabilities
       The  requested  capabilities  of  the context.  The capabilities must be a subset of those
       requested of the associated endpoint.  See the CAPABILITIES section of  fi_getinfo(3)  for
       capability details.  If the caps field is 0 on input to fi_getinfo(3), the caps value from
       the fi_info structure will be used.

   mode
       The operational mode bits of the context.  The  mode  bits  will  be  a  subset  of  those
       associated  with the endpoint.  See the MODE section of fi_getinfo(3) for details.  A mode
       value of 0 will be ignored on input to fi_getinfo(3), with the mode value of  the  fi_info
       structure  used instead.  On return from fi_getinfo(3), the mode will be set only to those
       constraints specific to transmit operations.

   op_flags - Default transmit operation flags
       Flags that control the operation of operations submitted against the context.   Applicable
       flags are listed in the Operation Flags section.

   msg_order - Message Ordering
       Message  ordering  refers  to the order in which transport layer headers (as viewed by the
       application) are processed.  Relaxed message order enables data transfers to be  sent  and
       received  out  of order, which may improve performance by utilizing multiple paths through
       the fabric from the initiating endpoint to a target endpoint.  Message order applies  only
       between  a single source and destination endpoint pair.  Ordering between different target
       endpoints is not defined.

       Message order is determined using a set of ordering bits.  Each  set  bit  indicates  that
       ordering  is  maintained  between  data transfers of the specified type.  Message order is
       defined for [read | write | send] operations submitted by an  application  after  [read  |
       write | send] operations.

       Message  ordering  only  applies  to  the  end  to  end transmission of transport headers.
       Message ordering is necessary, but does not guarantee, the order in which message data  is
       sent  or  received  by  the  transport layer.  Message ordering requires matching ordering
       semantics on the receiving side of a data transfer operation in order  to  guarantee  that
       ordering is met.

       FI_ORDER_NONE  :  No  ordering  is specified.  This value may be used as input in order to
       obtain the default message order supported by the provider.  FI_ORDER_NONE is an alias for
       the value 0.

       FI_ORDER_RAR : Read after read.  If set, RMA and atomic read operations are transmitted in
       the order submitted relative to other RMA and atomic read operations.  If not set, RMA and
       atomic reads may be transmitted out of order from their submission.

       FI_ORDER_RAW  :  Read after write.  If set, RMA and atomic read operations are transmitted
       in the order submitted relative to RMA and atomic write operations.  If not set,  RMA  and
       atomic reads may be transmitted ahead of RMA and atomic writes.

       FI_ORDER_RAS : Read after send.  If set, RMA and atomic read operations are transmitted in
       the order submitted relative to message send operations, including tagged sends.   If  not
       set, RMA and atomic reads may be transmitted ahead of sends.

       FI_ORDER_WAR  : Write after read.  If set, RMA and atomic write operations are transmitted
       in the order submitted relative to RMA and atomic read operations.  If not  set,  RMA  and
       atomic writes may be transmitted ahead of RMA and atomic reads.

       FI_ORDER_WAW : Write after write.  If set, RMA and atomic write operations are transmitted
       in the order submitted relative to other RMA and atomic write operations.  If not set, RMA
       and atomic writes may be transmitted out of order from their submission.

       FI_ORDER_WAS  : Write after send.  If set, RMA and atomic write operations are transmitted
       in the order submitted relative to message send operations, including  tagged  sends.   If
       not set, RMA and atomic writes may be transmitted ahead of sends.

       FI_ORDER_SAR  : Send after read.  If set, message send operations, including tagged sends,
       are transmitted in order submitted relative to RMA and atomic  read  operations.   If  not
       set, message sends may be transmitted ahead of RMA and atomic reads.

       FI_ORDER_SAW : Send after write.  If set, message send operations, including tagged sends,
       are transmitted in order submitted relative to RMA and atomic write  operations.   If  not
       set, message sends may be transmitted ahead of RMA and atomic writes.

       FI_ORDER_SAS  : Send after send.  If set, message send operations, including tagged sends,
       are transmitted in the order submitted relative  to  other  message  send.   If  not  set,
       message sends may be transmitted out of order from their submission.

   comp_order - Completion Ordering
       Completion  ordering  refers to the order in which completed requests are written into the
       completion queue.  Completion ordering is similar to message  order.   Relaxed  completion
       order may enable faster reporting of completed transfers, allow acknowledgments to be sent
       over different fabric paths, and support more sophisticated retry  mechanisms.   This  can
       result  in  lower-latency  completions,  particularly  when  using  unconnected endpoints.
       Strict completion ordering may require that providers queue completed operations or  limit
       available optimizations.

       For  transmit  requests,  completion  ordering depends on the endpoint communication type.
       For unreliable communication, completion ordering applies to all  data  transfer  requests
       submitted to an endpoint.  For reliable communication, completion ordering only applies to
       requests that target a single destination endpoint.  Completion ordering of requests  that
       target different endpoints over a reliable transport is not defined.

       Applications  should  specify  the  completion  ordering  that  they  support  or require.
       Providers should return  the  completion  order  that  they  actually  provide,  with  the
       constraint  that the returned ordering is stricter than that specified by the application.
       Supported completion order values are:

       FI_ORDER_NONE : No ordering is defined for completed operations.   Requests  submitted  to
       the transmit context may complete in any order.

       FI_ORDER_STRICT  :  Requests  complete  in  the  order  in which they are submitted to the
       transmit context.

   inject_size
       The requested inject operation size  (see  the  FI_INJECT  flag)  that  the  context  will
       support.   This  is  the  maximum size data transfer that can be associated with an inject
       operation (such as fi_inject) or may be used with the FI_INJECT data transfer flag.

   size
       The size of the context.  The  size  is  specified  as  the  minimum  number  of  transmit
       operations that may be posted to the endpoint without the operation returning -FI_EAGAIN.

   iov_limit
       This  is  the  maximum number of IO vectors (scatter-gather elements) that a single posted
       operation may reference.

   rma_iov_limit
       This is the maximum number of RMA IO vectors (scatter-gather  elements)  that  an  RMA  or
       atomic operation may reference.  The rma_iov_limit corresponds to the rma_iov_count values
       in RMA and atomic operations.  See struct fi_msg_rma and struct fi_msg_atomic in  fi_rma.3
       and  fi_atomic.3, for additional details.  This limit applies to both the number of RMA IO
       vectors that may be specified when initiating an operation from  the  local  endpoint,  as
       well  as  the  maximum number of IO vectors that may be carried in a single request from a
       remote endpoint.

RECEIVE CONTEXT ATTRIBUTES

       Attributes specific to the receive capabilities of an endpoint are specified using  struct
       fi_rx_attr.

              struct fi_rx_attr {
                  uint64_t  caps;
                  uint64_t  mode;
                  uint64_t  op_flags;
                  uint64_t  msg_order;
                  uint64_t  comp_order;
                  size_t    total_buffered_recv;
                  size_t    size;
                  size_t    iov_limit;
              };

   caps - Capabilities
       The  requested  capabilities  of  the context.  The capabilities must be a subset of those
       requested of the associated endpoint.  See the CAPABILITIES section if  fi_getinfo(3)  for
       capability details.  If the caps field is 0 on input to fi_getinfo(3), the caps value from
       the fi_info structure will be used.

   mode
       The operational mode bits of the context.  The  mode  bits  will  be  a  subset  of  those
       associated  with the endpoint.  See the MODE section of fi_getinfo(3) for details.  A mode
       value of 0 will be ignored on input to fi_getinfo(3), with the mode value of  the  fi_info
       structure  used instead.  On return from fi_getinfo(3), the mode will be set only to those
       constraints specific to receive operations.

   op_flags - Default receive operation flags
       Flags that control the operation of operations submitted against the context.   Applicable
       flags are listed in the Operation Flags section.

   msg_order - Message Ordering
       For  a  description  of  message ordering, see the msg_order field in the Transmit Context
       Attribute section.  Receive context message ordering defines the order in  which  received
       transport message headers are processed when received by an endpoint.

       The  following  ordering  flags,  as  defined  for  transmit  ordering,  also apply to the
       processing   of   received   operations:   FI_ORDER_NONE,   FI_ORDER_RAR,    FI_ORDER_RAW,
       FI_ORDER_RAS,  FI_ORDER_WAR,  FI_ORDER_WAW,  FI_ORDER_WAS, FI_ORDER_SAR, FI_ORDER_SAW, and
       FI_ORDER_SAS.

   comp_order - Completion Ordering
       For a description of completion ordering, see the comp_order field in the Transmit Context
       Attribute section.

       FI_ORDER_NONE  :  No ordering is defined for completed operations.  Receive operations may
       complete in any order, regardless of their submission order.

       FI_ORDER_STRICT : Receive operations complete in the order in which they are processed  by
       the receive context, based on the receive side msg_order attribute.

       FI_ORDER_DATA  : When set, this bit indicates that received data is written into memory in
       order.  Data ordering applies to memory accessed as part of a single operation and between
       operations if message ordering is guaranteed.

   total_buffered_recv
       This  field  is  supported  for  backwards  compatibility  purposes.   It is a hint to the
       provider of the total available space that may be  needed  to  buffer  messages  that  are
       received  for  which  there  is no matching receive operation.  The provider may adjust or
       ignore this value.  The allocation of internal network buffering among received message is
       provider  specific.   For instance, a provider may limit the size of messages which can be
       buffered or the amount of buffering allocated to a single message.

       If receive side buffering is disabled (total_buffered_recv = 0) and a message is  received
       by  an  endpoint,  then  the behavior is dependent on whether resource management has been
       enabled (FI_RM_ENABLED has be set  or  not).   See  the  Resource  Management  section  of
       fi_domain.3  for  further  clarification.   It  is  recommended  that  applications enable
       resource  management  if  they  anticipate  receiving  unexpected  messages,  rather  than
       modifying this value.

   size
       The  size  of  the  context.   The  size  is  specified  as  the minimum number of receive
       operations that may be posted to the endpoint without the operation returning -FI_EAGAIN.

   iov_limit
       This is the maximum number of IO vectors (scatter-gather elements) that  a  single  posted
       operating may reference.

SCALABLE ENDPOINTS

       A  scalable endpoint is a communication portal that supports multiple transmit and receive
       contexts.  Scalable  endpoints  are  loosely  modeled  after  the  networking  concept  of
       transmit/receive  side scaling, also known as multi-queue.  Support for scalable endpoints
       is domain specific.  Scalable endpoints may improve the performance of multi-threaded  and
       parallel  applications,  by  allowing  threads  to access independent transmit and receive
       queues.  A scalable endpoint has a single transport level address, which  can  reduce  the
       memory  requirements  needed  to  store  remote  addressing  data,  versus  using standard
       endpoints.  Scalable endpoints cannot be used directly for communication  operations,  and
       require  the  application  to explicitly create transmit and receive contexts as described
       below.

   fi_tx_context
       Transmit contexts are independent transmit queues.  Ordering and  synchronization  between
       contexts  are not defined.  Conceptually a transmit context behaves similar to a send-only
       endpoint.  A transmit context may be configured with  fewer  capabilities  than  the  base
       endpoint  and  with  different  attributes (such as ordering requirements and inject size)
       than other contexts associated with the same scalable endpoint.  Each transmit context has
       its  own completion queue.  The number of transmit contexts associated with an endpoint is
       specified during endpoint creation.

       The fi_tx_context call is used to retrieve a specific context, identified by an index (see
       above  for  details  on  transmit context attributes).  Providers may dynamically allocate
       contexts when fi_tx_context  is  called,  or  may  statically  create  all  contexts  when
       fi_endpoint  is  invoked.   By  default, a transmit context inherits the properties of its
       associated endpoint.   However,  applications  may  request  context  specific  attributes
       through  the  attr  parameter.   Support  for  per transmit context attributes is provider
       specific and not guaranteed.  Providers will return the actual attributes assigned to  the
       context through the attr parameter, if provided.

   fi_rx_context
       Receive contexts are independent receive queues for receiving incoming data.  Ordering and
       synchronization between contexts are  not  guaranteed.   Conceptually  a  receive  context
       behaves  similar  to  a  receive-only  endpoint.  A receive context may be configured with
       fewer capabilities than the base endpoint and with different attributes (such as  ordering
       requirements  and  inject  size)  than  other  contexts  associated with the same scalable
       endpoint.  Each receive context has its own  completion  queue.   The  number  of  receive
       contexts associated with an endpoint is specified during endpoint creation.

       Receive  contexts  are  often  associated with steering flows, that specify which incoming
       packets targeting a scalable endpoint  to  process.   However,  receive  contexts  may  be
       targeted  directly  by  the  initiator,  if  supported  by  the underlying protocol.  Such
       contexts are referred to as 'named'.  Support for named  contexts  must  be  indicated  by
       setting  the  caps  FI_NAMED_RX_CTX capability when the corresponding endpoint is created.
       Support for named receive contexts is coordinated with address vectors.  See fi_av(3)  and
       fi_rx_addr(3).

       The fi_rx_context call is used to retrieve a specific context, identified by an index (see
       above for details on receive context  attributes).   Providers  may  dynamically  allocate
       contexts  when  fi_rx_context  is  called,  or  may  statically  create  all contexts when
       fi_endpoint is invoked.  By default, a receive context  inherits  the  properties  of  its
       associated  endpoint.   However,  applications  may  request  context  specific attributes
       through the attr parameter.  Support  for  per  receive  context  attributes  is  provider
       specific  and not guaranteed.  Providers will return the actual attributes assigned to the
       context through the attr parameter, if provided.

SHARED CONTEXTS

       Shared contexts are transmit and receive contexts explicitly  shared  among  one  or  more
       endpoints.   A  shareable context allows an application to use a single dedicated provider
       resource among multiple transport addressable endpoints.   This  can  greatly  reduce  the
       resources  needed to manage communication over multiple endpoints by multiplexing transmit
       and/or receive processing, with the potential cost of serializing access  across  multiple
       endpoints.  Support for shareable contexts is domain specific.

       Conceptually, shareable transmit contexts are transmit queues that may be accessed by many
       endpoints.  The use of a shared transmit context  is  mostly  opaque  to  an  application.
       Applications  must allocate and bind shared transmit contexts to endpoints, but operations
       are posted directly to the endpoint.  Shared transmit contexts  are  not  associated  with
       completion  queues  or  counters.  Completed operations are posted to the CQs bound to the
       endpoint.  An endpoint may only be associated with a single shared transmit context.

       Unlike shared transmit  contexts,  applications  interact  directly  with  shared  receive
       contexts.   Users  post  receive  buffers  directly  to a shared receive context, with the
       buffers usable by any endpoint bound  to  the  shared  receive  context.   Shared  receive
       contexts  are  not  associated  with  completion  queues  or  counters.  Completed receive
       operations are posted to the  CQs  bound  to  the  endpoint.   An  endpoint  may  only  be
       associated with a single receive context, and all connectionless endpoints associated with
       a shared receive context must also share the same address vector.

       Endpoints associated with a shared transmit context may use  dedicated  receive  contexts,
       and  vice-versa.   Or an endpoint may use shared transmit and receive contexts.  And there
       is no requirement that the same group of endpoints sharing a  context  of  one  type  also
       share the context of an alternate type.  Furthermore, an endpoint may use a shared context
       of one type, but a scalable set of contexts of the alternate type.

   fi_stx_context
       This call is used to open a shareable transmit context  (see  above  for  details  on  the
       transmit  context  attributes).   Endpoints associated with a shared transmit context must
       use a subset of the transmit context's attributes.  Note that this is the reverse  of  the
       requirement for transmit contexts for scalable endpoints.

   fi_srx_context
       This  allocates  a shareable receive context (see above for details on the receive context
       attributes).  Endpoints associated with a shared receive context must use a subset of  the
       receive  context's  attributes.   Note  that  this  is  the reverse of the requirement for
       receive contexts for scalable endpoints.

SOCKET ENDPOINTS

       The following feature and  description  should  be  considered  experimental.   Until  the
       experimental  tag  is  removed,  the interfaces, semantics, and data structures associated
       with socket endpoints may change between library versions.

       This section applies to endpoints of type FI_EP_SOCK_STREAM and FI_EP_SOCK_DGRAM, commonly
       referred to as socket endpoints.

       Socket  endpoints  are defined with semantics that allow them to more easily be adopted by
       developers familiar with the UNIX socket API, or by middleware  that  exposes  the  socket
       API, while still taking advantage of high-performance hardware features.

       The  key  difference  between  socket  endpoints  and  other  active  endpoints are socket
       endpoints use synchronous data transfers.  Buffers passed into send and receive operations
       revert  to  the  control  of  the application upon returning from the function call.  As a
       result, no data transfer completions are reported to the application, and socket endpoints
       are not associated with completion queues or counters.

       Socket  endpoints  support  a subset of message operations: fi_send, fi_sendv, fi_sendmsg,
       fi_recv, fi_recvv, fi_recvmsg, and fi_inject.  Because data transfers are synchronous, the
       return  value from send and receive operations indicate the number of bytes transferred on
       success, or a negative value on error, including -FI_EAGAIN if the endpoint cannot send or
       receive any data because of full or empty queues, respectively.

       Socket  endpoints  are  associated  with  event  queues  and  address vectors, and process
       connection management events asynchronously, similar  to  other  endpoints.   Unlike  UNIX
       sockets, socket endpoint must still be declared as either active or passive.

       Socket  endpoints  behave  like non-blocking sockets.  In order to support select and poll
       semantics, active socket endpoints are associated with a file descriptor that is  signaled
       whenever  the  endpoint  is ready to send and/or receive data.  The file descriptor may be
       retrieved using fi_control.

OPERATION FLAGS

       Operation flags are obtained by OR-ing the  following  flags  together.   Operation  flags
       define  the default flags applied to an endpoint's data transfer operations, where a flags
       parameter is not available.  Data transfer operations that take flags  as  input  override
       the op_flags value of transmit or receive context attributes of an endpoint.

       FI_INJECT  :  Indicates  that  all  outbound data buffers should be returned to the user's
       control immediately after a data transfer call returns, even if the operation  is  handled
       asynchronously.   This may require that the provider copy the data into a local buffer and
       transfer out of that buffer.  A provider can limit the total amount of send data that  may
       be  buffered  and/or  the  size  of  a  single send that can use this flag.  This limit is
       indicated using inject_size (see inject_size above).

       FI_MULTI_RECV : Applies to posted receive operations.  This flag allows the user to post a
       single  buffer  that  will  receive multiple incoming messages.  Received messages will be
       packed into the receive buffer until the buffer has been consumed.  Use of this  flag  may
       cause  a  single posted receive operation to generate multiple completions as messages are
       placed into the buffer.  The placement of received data into the buffer may  be  subjected
       to  provider specific alignment restrictions.  The buffer will be released by the provider
       when   the   available   buffer   space   falls   below   the   specified   minimum   (see
       FI_OPT_MIN_MULTI_RECV).

       FI_COMPLETION  :  Indicates  that a completion entry should be generated for data transfer
       operations.  This flag only applies to operations issued on endpoints that were bound to a
       CQ or counter with the FI_SELECTIVE_COMPLETION flag.  See the fi_ep_bind section above for
       more detail.

       FI_INJECT_COMPLETE : Indicates that a completion  should  be  generated  when  the  source
       buffer(s)  may  be reused.  A completion guarantees that the buffers will not be read from
       again and the application may reclaim them.  No other guarantees are made with respect  to
       the state of the operation.

       Note:  This  flag is used to control when a completion entry is inserted into a completion
       queue.  It does not apply to operations that do not generate  a  completion  queue  entry,
       such  as  the  fi_inject  operation,  and  is not subject to the inject_size message limit
       restriction.

       FI_TRANSMIT_COMPLETE : Indicates that a completion should be generated when  the  transmit
       operation  has  completed relative to the local provider.  The exact behavior is dependent
       on the endpoint type.

       For reliable endpoints:

       Indicates that a completion should be generated when the operation has been  delivered  to
       the  peer  endpoint.  A completion guarantees that the operation is no longer dependent on
       the fabric or local resources.  The state of the operation at the  peer  endpoint  is  not
       defined.

       For unreliable endpoints:

       Indicates  that  a completion should be generated when the operation has been delivered to
       the fabric.  A completion guarantees that the operation is no longer  dependent  on  local
       resources.  The state of the operation within the fabric is not defined.

       FI_DELIVERY_COMPLETE  :  Indicates  that  a  completion  should  not be generated until an
       operation has been processed by the destination endpoint(s).  A completion guarantees that
       the result of the operation is available.

       This  completion mode applies only to reliable endpoints.  For operations that return data
       to the initiator, such as RMA read or atomic-fetch, the source endpoint is also considered
       a destination endpoint.  This is the default completion mode for such operations.

       FI_COMMIT_COMPLETE  :  Indicates  that a completion should not be generated (locally or at
       the peer) until the result of an  operation  have  been  made  persistent.   A  completion
       guarantees that the result is both available and durable, in the case of power failure.

       This completion mode applies only to operations that target persistent memory regions over
       reliable endpoints.  This completion mode is experimental.

       FI_MULTICAST : Indicates that data transfers will target multicast addresses  by  default.
       Any  fi_addr_t  passed  into  a  data  transfer  operation  will be treated as a multicast
       address.

NOTES

       Users should call fi_close to release all resources allocated to the fabric endpoint.

       Endpoints allocated with the FI_CONTEXT mode set must typically provide struct  fi_context
       as  their  per operation context parameter.  (See fi_getinfo.3 for details.) However, when
       FI_SELECTIVE_COMPLETION is enabled to suppress completion entries,  and  an  operation  is
       initiated  without  FI_COMPLETION  flag  set,  then  the context parameter is ignored.  An
       application does not need to pass in a valid struct fi_context into such data transfers.

       Operations that complete in error that are not associated with valid  operational  context
       will use the endpoint context in any error reporting structures.

       Although  applications  typically  associate individual completions with either completion
       queues or counters, an endpoint can be attached to both a counter  and  completion  queue.
       When combined with using selective completions, this allows an application to use counters
       to track successful completions, with  a  CQ  used  to  report  errors.   Operations  that
       complete  with  an error increment the error counter and generate a completion event.  The
       generation   of   entries   going   to   the   CQ   can   then   be    controlled    using
       FI_SELECTIVE_COMPLETION.

       As  mentioned  in fi_getinfo(3), the ep_attr structure can be used to query providers that
       support various endpoint attributes.  fi_getinfo can return provider info structures  that
       can  support  the  minimal  set  of  requirements  (such  that  the  application maintains
       correctness).   However,  it  can  also  return  provider  info  structures  that   exceed
       application  requirements.  As an example, consider an application requesting msg_order as
       FI_ORDER_NONE.  The resulting output from fi_getinfo may have all the ordering  bits  set.
       The  application  can  reset  the  ordering  bits  it does not require before creating the
       endpoint.  The provider is free to implement a stricter ordering than is required  by  the
       application.

RETURN VALUES

       Returns  0  on  success.   On  error,  a  negative  value corresponding to fabric errno is
       returned.  For fi_cancel, a return value of  0  indicates  that  the  cancel  request  was
       submitted for processing.

       Fabric errno values are defined in rdma/fi_errno.h.

ERRORS

       -FI_EDOMAIN  :  A  resource domain was not bound to the endpoint or an attempt was made to
       bind multiple domains.

       -FI_ENOCQ : The endpoint has not been configured with necessary event queue.

       -FI_EOPBADSTATE : The endpoint's state does not permit the requested operation.

SEE ALSO

       fi_getinfo(3), fi_domain(3), fi_msg(3), fi_tagged(3), fi_rma(3)

AUTHORS

       OpenFabrics.