Provided by: libfabric-dev_1.6.2-3ubuntu0.1_amd64 

NAME
fi_eq - Event queue operations
fi_eq_open / fi_close : Open/close an event queue
fi_control : Control operation of EQ
fi_eq_read / fi_eq_readerr : Read an event from an event queue
fi_eq_write : Writes an event to an event queue
fi_eq_sread : A synchronous (blocking) read of an event queue
fi_eq_strerror : Converts provider specific error information into a printable string
SYNOPSIS
#include <rdma/fi_domain.h>
int fi_eq_open(struct fid_fabric *fabric, struct fi_eq_attr *attr,
struct fid_eq **eq, void *context);
int fi_close(struct fid *eq);
int fi_control(struct fid *eq, int command, void *arg);
ssize_t fi_eq_read(struct fid_eq *eq, uint32_t *event,
void *buf, size_t len, uint64_t flags);
ssize_t fi_eq_readerr(struct fid_eq *eq, struct fi_eq_err_entry *buf,
uint64_t flags);
ssize_t fi_eq_write(struct fid_eq *eq, uint32_t event,
const void *buf, size_t len, uint64_t flags);
ssize_t fi_eq_sread(struct fid_eq *eq, uint32_t *event,
void *buf, size_t len, int timeout, uint64_t flags);
const char * fi_eq_strerror(struct fid_eq *eq, int prov_errno,
const void *err_data, char *buf, size_t len);
ARGUMENTS
fabric : Opened fabric descriptor
eq : Event queue
attr : Event queue attributes
context : User specified context associated with the event queue.
event : Reported event
buf : For read calls, the data buffer to write events into. For write calls, an event to insert into the
event queue. For fi_eq_strerror, an optional buffer that receives printable error information.
len : Length of data buffer
flags : Additional flags to apply to the operation
command : Command of control operation to perform on EQ.
arg : Optional control argument
prov_errno : Provider specific error value
err_data : Provider specific error data related to a completion
timeout : Timeout specified in milliseconds
DESCRIPTION
Event queues are used to report events associated with control operations. They are associated with
memory registration, address vectors, connection management, and fabric and domain level events.
Reported events are either associated with a requested operation or affiliated with a call that registers
for specific types of events, such as listening for connection requests.
fi_eq_open
fi_eq_open allocates a new event queue.
The properties and behavior of an event queue are defined by struct fi_eq_attr.
struct fi_eq_attr {
size_t size; /* # entries for EQ */
uint64_t flags; /* operation flags */
enum fi_wait_obj wait_obj; /* requested wait object */
int signaling_vector; /* interrupt affinity */
struct fid_wait *wait_set; /* optional wait set */
};
size : Specifies the minimum size of an event queue.
flags : Flags that control the configuration of the EQ.
• FI_WRITE : Indicates that the application requires support for inserting user events into the EQ. If
this flag is set, then the fi_eq_write operation must be supported by the provider. If the FI_WRITE
flag is not set, then the application may not invoke fi_eq_write.
• FI_AFFINITY : Indicates that the signaling_vector field (see below) is valid.
wait_obj : EQ's may be associated with a specific wait object. Wait objects allow applications to block
until the wait object is signaled, indicating that an event is available to be read. Users may use
fi_control to retrieve the underlying wait object associated with an EQ, in order to use it in other
system calls. The following values may be used to specify the type of wait object associated with an EQ:
• FI_WAIT_NONE : Used to indicate that the user will not block (wait) for events on the EQ. When
FI_WAIT_NONE is specified, the application may not call fi_eq_sread. This is the default is no wait
object is specified.
• FI_WAIT_UNSPEC : Specifies that the user will only wait on the EQ using fabric interface calls, such as
fi_eq_sread. In this case, the underlying provider may select the most appropriate or highest
performing wait object available, including custom wait mechanisms. Applications that select
FI_WAIT_UNSPEC are not guaranteed to retrieve the underlying wait object.
• FI_WAIT_SET : Indicates that the event queue should use a wait set object to wait for events. If
specified, the wait_set field must reference an existing wait set object.
• FI_WAIT_FD : Indicates that the EQ should use a file descriptor as its wait mechanism. A file
descriptor wait object must be usable in select, poll, and epoll routines. However, a provider may
signal an FD wait object by marking it as readable or with an error.
• FI_WAIT_MUTEX_COND : Specifies that the EQ should use a pthread mutex and cond variable as a wait
object.
• FI_WAIT_CRITSEC_COND : Windows specific. Specifies that the EQ should use a critical section and
condition variable as a wait object.
signaling_vector : If the FI_AFFINITY flag is set, this indicates the logical cpu number (0..max cpu - 1)
that interrupts associated with the EQ should target. This field should be treated as a hint to the
provider and may be ignored if the provider does not support interrupt affinity.
wait_set : If wait_obj is FI_WAIT_SET, this field references a wait object to which the event queue
should attach. When an event is inserted into the event queue, the corresponding wait set will be
signaled if all necessary conditions are met. The use of a wait_set enables an optimized method of
waiting for events across multiple event queues. This field is ignored if wait_obj is not FI_WAIT_SET.
fi_close
The fi_close call releases all resources associated with an event queue. Any events which remain on the
EQ when it is closed are lost.
The EQ must not be bound to any other objects prior to being closed, otherwise the call will return
-FI_EBUSY.
fi_control
The fi_control call is used to access provider or implementation specific details of the event queue.
Access to the EQ should be serialized across all calls when fi_control is invoked, as it may redirect the
implementation of EQ operations. The following control commands are usable with an EQ.
FI_GETWAIT (void **) : This command allows the user to retrieve the low-level wait object associated with
the EQ. The format of the wait-object is specified during EQ creation, through the EQ attributes. The
fi_control arg parameter should be an address where a pointer to the returned wait object will be
written. This should be an 'int *' for FI_WAIT_FD, or 'struct fi_mutex_cond' for FI_WAIT_MUTEX_COND.
struct fi_mutex_cond {
pthread_mutex_t *mutex;
pthread_cond_t *cond;
};
fi_eq_read
The fi_eq_read operations performs a non-blocking read of event data from the EQ. The format of the
event data is based on the type of event retrieved from the EQ, with all events starting with a struct
fi_eq_entry header. At most one event will be returned per EQ read operation. The number of bytes
successfully read from the EQ is returned from the read. The FI_PEEK flag may be used to indicate that
event data should be read from the EQ without being consumed. A subsequent read without the FI_PEEK flag
would then remove the event from the EQ.
The following types of events may be reported to an EQ, along with information regarding the format
associated with each event.
Asynchronous Control Operations : Asynchronous control operations are basic requests that simply need to
generate an event to indicate that they have completed. These include the following types of events:
memory registration, address vector resolution, and multicast joins.
Control requests report their completion by inserting a struct fi_eq_entry into the EQ. The format of
this structure is:
struct fi_eq_entry {
fid_t fid; /* fid associated with request */
void *context; /* operation context */
uint64_t data; /* completion-specific data */
};
For the completion of basic asynchronous control operations, the returned event will indicate the
operation that has completed, and the fid will reference the fabric descriptor associated with the event.
For memory registration, this will be an FI_MR_COMPLETE event and the fid_mr. Address resolution will
reference an FI_AV_COMPLETE event and fid_av. Multicast joins will report an FI_JOIN_COMPLETE and
fid_mc. The context field will be set to the context specified as part of the operation, if available,
otherwise the context will be associated with the fabric descriptor. The data field will be set as
described in the man page for the corresponding object type (e.g., see fi_av(3) for a description of how
asynchronous address vector insertions are completed).
Connection Notification : Connection notifications are connection management notifications used to setup
or tear down connections between endpoints. There are three connection notification events: FI_CONNREQ,
FI_CONNECTED, and FI_SHUTDOWN. Connection notifications are reported using struct fi_eq_cm_entry:
struct fi_eq_cm_entry {
fid_t fid; /* fid associated with request */
struct fi_info *info; /* endpoint information */
uint8_t data[]; /* app connection data */
};
A connection request (FI_CONNREQ) event indicates that a remote endpoint wishes to establish a new
connection to a listening, or passive, endpoint. The fid is the passive endpoint. Information regarding
the requested, active endpoint's capabilities and attributes are available from the info field. The
application is responsible for freeing this structure by calling fi_freeinfo when it is no longer needed.
The fi_info connreq field will reference the connection request associated with this event. To accept a
connection, an endpoint must first be created by passing an fi_info structure referencing this connreq
field to fi_endpoint(). This endpoint is then passed to fi_accept() to complete the acceptance of the
connection attempt. Creating the endpoint is most easily accomplished by passing the fi_info returned as
part of the CM event into fi_endpoint(). If the connection is to be rejected, the connreq is passed to
fi_reject().
Any application data exchanged as part of the connection request is placed beyond the fi_eq_cm_entry
structure. The amount of data available is application dependent and limited to the buffer space
provided by the application when fi_eq_read is called. The amount of returned data may be calculated
using the return value to fi_eq_read. Note that the amount of returned data is limited by the underlying
connection protocol, and the length of any data returned may include protocol padding. As a result, the
returned length may be larger than that specified by the connecting peer.
If a connection request has been accepted, an FI_CONNECTED event will be generated on both sides of the
connection. The active side -- one that called fi_connect() -- may receive user data as part of the
FI_CONNECTED event. The user data is passed to the connection manager on the passive side through the
fi_accept call. User data is not provided with an FI_CONNECTED event on the listening side of the
connection.
Notification that a remote peer has disconnected from an active endpoint is done through the FI_SHUTDOWN
event. Shutdown notification uses struct fi_eq_cm_entry as declared above. The fid field for a shutdown
notification refers to the active endpoint's fid_ep.
Asynchronous Error Notification : Asynchronous errors are used to report problems with fabric resources.
Reported errors may be fatal or transient, based on the error, and result in the resource becoming
disabled. Disabled resources will fail operations submitted against them until they are explicitly
re-enabled by the application.
Asynchronous errors may be reported for completion queues and endpoints of all types. CQ errors can
result when resource management has been disabled, and the provider has detected a queue overrun.
Endpoint errors may be result of numerous actions, but are often associated with a failed operation.
Operations may fail because of buffer overruns, invalid permissions, incorrect memory access keys,
network routing failures, network reach-ability issues, etc.
Asynchronous errors are reported using struct fi_eq_err_entry, as defined below. The fabric descriptor
(fid) associated with the error is provided as part of the error data. An error code is also available
to determine the cause of the error.
fi_eq_sread
The fi_eq_sread call is the blocking (or synchronous) equivalent to fi_eq_read. It behaves is similar to
the non-blocking call, with the exception that the calls will not return until either an event has been
read from the EQ or an error or timeout occurs. Specifying a negative timeout means an infinite timeout.
It is invalid for applications to call this function if the EQ has been configured with a wait object of
FI_WAIT_NONE or FI_WAIT_SET.
fi_eq_readerr
The read error function, fi_eq_readerr, retrieves information regarding any asynchronous operation which
has completed with an unexpected error. fi_eq_readerr is a non-blocking call, returning immediately
whether an error completion was found or not.
EQs are optimized to report operations which have completed successfully. Operations which fail are
reported 'out of band'. Such operations are retrieved using the fi_eq_readerr function. When an
operation that completes with an unexpected error is inserted into an EQ, it is placed into a temporary
error queue. Attempting to read from an EQ while an item is in the error queue results in an FI_EAVAIL
failure. Applications may use this return code to determine when to call fi_eq_readerr.
Error information is reported to the user through struct fi_eq_err_entry. The format of this structure
is defined below.
struct fi_eq_err_entry {
fid_t fid; /* fid associated with error */
void *context; /* operation context */
uint64_t data; /* completion-specific data */
int err; /* positive error code */
int prov_errno; /* provider error code */
void *err_data; /* additional error data */
size_t err_data_size; /* size of err_data */
};
The fid will reference the fabric descriptor associated with the event. For memory registration, this
will be the fid_mr, address resolution will reference a fid_av, and CM events will refer to a fid_ep.
The context field will be set to the context specified as part of the operation.
The data field will be set as described in the man page for the corresponding object type (e.g., see
fi_av(3) for a description of how asynchronous address vector insertions are completed).
The general reason for the error is provided through the err field. Provider or operational specific
error information may also be available through the prov_errno and err_data fields. Users may call
fi_eq_strerror to convert provider specific error information into a printable string for debugging
purposes.
On input, err_data_size indicates the size of the err_data buffer in bytes. On output, err_data_size
will be set to the number of bytes copied to the err_data buffer. The err_data information is typically
used with fi_eq_strerror to provide details about the type of error that occurred.
For compatibility purposes, if err_data_size is 0 on input, or the fabric was opened with release < 1.5,
err_data will be set to a data buffer owned by the provider. The contents of the buffer will remain
valid until a subsequent read call against the EQ. Applications must serialize access to the EQ when
processing errors to ensure that the buffer referenced by err_data does not change.
EVENT FIELDS
The EQ entry data structures share many of the same fields. The meanings are the same or similar for all
EQ structure formats, with specific details described below.
fid : This corresponds to the fabric descriptor associated with the event. The type of fid depends on
the event being reported. For FI_CONNREQ this will be the fid of the passive endpoint. FI_CONNECTED and
FI_SHUTDOWN will reference the active endpoint. FI_MR_COMPLETE and FI_AV_COMPLETE will refer to the MR
or AV fabric descriptor, respectively. FI_JOIN_COMPLETE will point to the multicast descriptor returned
as part of the join operation. Applications can use fid->context value to retrieve the context
associated with the fabric descriptor.
context : The context value is set to the context parameter specified with the operation that generated
the event. If no context parameter is associated with the operation, this field will be NULL.
data : Data is an operation specific value or set of bytes. For connection events, data is application
data exchanged as part of the connection protocol.
err : This err code is a positive fabric errno associated with an event. The err value indicates the
general reason for an error, if one occurred. See fi_errno.3 for a list of possible error codes.
prov_errno : On an error, prov_errno may contain a provider specific error code. The use of this field
and its meaning is provider specific. It is intended to be used as a debugging aid. See fi_eq_strerror
for additional details on converting this error value into a human readable string.
err_data : On an error, err_data may reference a provider specific amount of data associated with an
error. The use of this field and its meaning is provider specific. It is intended to be used as a
debugging aid. See fi_eq_strerror for additional details on converting this error data into a human
readable string.
err_data_size : On input, err_data_size indicates the size of the err_data buffer in bytes. On output,
err_data_size will be set to the number of bytes copied to the err_data buffer. The err_data information
is typically used with fi_eq_strerror to provide details about the type of error that occurred.
For compatibility purposes, if err_data_size is 0 on input, or the fabric was opened with release < 1.5,
err_data will be set to a data buffer owned by the provider. The contents of the buffer will remain
valid until a subsequent read call against the EQ. Applications must serialize access to the EQ when
processing errors to ensure that the buffer referenced by err_data does no change.
NOTES
If an event queue has been overrun, it will be placed into an 'overrun' state. Write operations against
an overrun EQ will fail with -FI_EOVERRUN. Read operations will continue to return any valid,
non-corrupted events, if available. After all valid events have been retrieved, any attempt to read the
EQ will result in it returning an FI_EOVERRUN error event. Overrun event queues are considered fatal and
may not be used to report additional events once the overrun occurs.
RETURN VALUES
fi_eq_open : Returns 0 on success. On error, a negative value corresponding to fabric errno is returned.
fi_eq_read / fi_eq_readerr / fi_eq_sread : On success, returns the number of bytes read from the event
queue. On error, a negative value corresponding to fabric errno is returned. If no data is available to
be read from the event queue, -FI_EAGAIN is returned.
fi_eq_write : On success, returns the number of bytes written to the event queue. On error, a negative
value corresponding to fabric errno is returned.
fi_eq_strerror : Returns a character string interpretation of the provider specific error returned with a
completion.
Fabric errno values are defined in rdma/fi_errno.h.
SEE ALSO
fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cntr(3), fi_poll(3)
AUTHORS
OpenFabrics.
Libfabric Programmer's Manual 2017-12-01 fi_eq(3)