Provided by: liburing-dev_2.4-1_amd64 bug

NAME

       io_uring_register - register files or user buffers for asynchronous I/O

SYNOPSIS

       #include <liburing.h>

       int io_uring_register(unsigned int fd, unsigned int opcode,
                             void *arg, unsigned int nr_args);

DESCRIPTION

       The  io_uring_register(2)  system  call  registers  resources  (e.g.  user buffers, files,
       eventfd, personality, restrictions) for use in an io_uring(7) instance referenced  by  fd.
       Registering  files  or  user  buffers  allows  the  kernel to take long term references to
       internal data structures or create long  term  mappings  of  application  memory,  greatly
       reducing per-I/O overhead.

       fd is the file descriptor returned by a call to io_uring_setup(2).  If opcode has the flag
       IORING_REGISTER_USE_REGISTERED_RING ored into it, fd is instead the index of a  registered
       ring fd.

       opcode can be one of:

       IORING_REGISTER_BUFFERS
              arg points to a struct iovec array of nr_args entries.  The buffers associated with
              the iovecs will be locked in memory and charged against the  user's  RLIMIT_MEMLOCK
              resource  limit.   See getrlimit(2) for more information.  Additionally, there is a
              size limit of 1GiB per buffer.  Currently, the buffers must be anonymous, non-file-
              backed memory, such as that returned by malloc(3) or mmap(2) with the MAP_ANONYMOUS
              flag set.  It is expected that this limitation will be lifted in the  future.  Huge
              pages  are  supported as well. Note that the entire huge page will be pinned in the
              kernel, even if only a portion of it is used.

              After a successful call, the supplied  buffers  are  mapped  into  the  kernel  and
              eligible  for  I/O.   To  make  use  of  them,  the  application  must  specify the
              IORING_OP_READ_FIXED or IORING_OP_WRITE_FIXED opcodes in the submission queue entry
              (see  the  struct  io_uring_sqe  definition  in  io_uring_enter(2)),  and  set  the
              buf_index field to the desired buffer index.  The memory  range  described  by  the
              submission queue entry's addr and len fields must fall within the indexed buffer.

              It  is  perfectly valid to setup a large buffer and then only use part of it for an
              I/O, as long as the range is within the originally mapped region.

              An application can increase or decrease the size or number of registered buffers by
              first  unregistering  the  existing  buffers,  and  then  issuing  a  new  call  to
              io_uring_register(2) with the new buffers.

              Note that before 5.13 registering buffers would wait for the ring to idle.  If  the
              application  currently has requests in-flight, the registration will wait for those
              to finish before proceeding.

              An application need not unregister buffers  explicitly  before  shutting  down  the
              io_uring  instance.  Note, however, that shutdown processing may run asynchronously
              within the kernel. As a result, it is not guaranteed  that  pages  are  immediately
              unpinned in this case. Available since 5.1.

       IORING_REGISTER_BUFFERS2
              Register  buffers  for  I/O.  Similar to IORING_REGISTER_BUFFERS but aims to have a
              more extensible ABI.

              arg points to a struct io_uring_rsrc_register, and nr_args should  be  set  to  the
              number of bytes in the structure.

               struct io_uring_rsrc_register {
                   __u32 nr;
                   __u32 resv;
                   __u64 resv2;
                   __aligned_u64 data;
                   __aligned_u64 tags;
               };

               The data field contains a pointer to a struct iovec array of nr entries.  The tags
               field should either be 0, then tagging is disabled, or point to  an  array  of  nr
               "tags"  (unsigned  64  bit  integers).  If  a  tag  is zero, then tagging for this
               particular resource (a buffer in this case)  is  disabled.  Otherwise,  after  the
               resource  had  been  unregistered  and it's not used anymore, a CQE will be posted
               with user_data set to the specified tag and all other fields zeroed.

               Note  that   resource   updates,   e.g.    IORING_REGISTER_BUFFERS_UPDATE,   don't
               necessarily  deallocate  resources  by the time it returns, but they might be held
               alive until all requests using it complete.

               Available since 5.13.

       IORING_REGISTER_BUFFERS_UPDATE
              Updates registered buffers with new ones, either turning a sparse entry into a real
              one, or replacing an existing entry.

              arg  must  contain  a  pointer to a struct io_uring_rsrc_update2, which contains an
              offset on which to start the update, and an array of struct iovec.  tags points  to
              an  array  of  tags.   nr  must  contain the number of descriptors in the passed in
              arrays.  See IORING_REGISTER_BUFFERS2 for the resource tagging description.

               struct io_uring_rsrc_update2 {
                   __u32 offset;
                   __u32 resv;
                   __aligned_u64 data;
                   __aligned_u64 tags;
                   __u32 nr;
                   __u32 resv2;
               };

               Available since 5.13.

       IORING_UNREGISTER_BUFFERS
              This operation takes no argument, and arg must be passed as NULL.   All  previously
              registered   buffers  associated  with  the  io_uring  instance  will  be  released
              synchronously. Available since 5.1.

       IORING_REGISTER_FILES
              Register files for I/O.  arg contains  a  pointer  to  an  array  of  nr_args  file
              descriptors (signed 32 bit integers).

              To  make  use of the registered files, the IOSQE_FIXED_FILE flag must be set in the
              flags member of the struct io_uring_sqe, and the fd member is set to the  index  of
              the file in the file descriptor array.

              The  file  set  may be sparse, meaning that the fd field in the array may be set to
              -1.  See IORING_REGISTER_FILES_UPDATE for how to update files in place.

              Note that before 5.13 registering files would wait for the ring to  idle.   If  the
              application  currently has requests in-flight, the registration will wait for those
              to finish before proceeding. See IORING_REGISTER_FILES_UPDATE for how to update  an
              existing set without that limitation.

              Files  are  automatically  unregistered when the io_uring instance is torn down. An
              application needs only unregister if it wishes  to  register  a  new  set  of  fds.
              Available since 5.1.

       IORING_REGISTER_FILES2
              Register files for I/O. Similar to IORING_REGISTER_FILES.

              arg  points  to  a  struct io_uring_rsrc_register, and nr_args should be set to the
              number of bytes in the structure.

              The data field contains a pointer to an array of nr file descriptors (signed 32 bit
              integers).   tags  field  should  either  be 0 or or point to an array of nr "tags"
              (unsigned 64 bit integers). See IORING_REGISTER_BUFFERS2 for more info on  resource
              tagging.

              Note  that  resource updates, e.g.  IORING_REGISTER_FILES_UPDATE, don't necessarily
              deallocate resources, they might be held until all  requests  using  that  resource
              complete.

              Available since 5.13.

       IORING_REGISTER_FILES_UPDATE
              This  operation  replaces  existing files in the registered file set with new ones,
              either turning a sparse entry (one where fd is equal to  -1  )  into  a  real  one,
              removing  an existing entry (new one is set to -1 ), or replacing an existing entry
              with a new existing entry.

              arg must contain a pointer to a struct  io_uring_files_update,  which  contains  an
              offset  on  which  to start the update, and an array of file descriptors to use for
              the update.  nr_args must contain the number of descriptors in the passed in array.
              Available since 5.5.

              File  descriptors  can  be  skipped  if they are set to IORING_REGISTER_FILES_SKIP.
              Skipping an fd will not touch the file associated with  the  previous  fd  at  that
              index. Available since 5.12.

       IORING_REGISTER_FILES_UPDATE2
              Similar  to IORING_REGISTER_FILES_UPDATE, replaces existing files in the registered
              file set with new ones, either turning a sparse entry (one where fd is equal to  -1
              ) into a real one, removing an existing entry (new one is set to -1 ), or replacing
              an existing entry with a new existing entry.

              arg must contain a pointer to a struct  io_uring_rsrc_update2,  which  contains  an
              offset  on  which  to start the update, and an array of file descriptors to use for
              the update stored in data.  tags points to an array of tags.  nr must  contain  the
              number  of  descriptors  in the passed in arrays.  See IORING_REGISTER_BUFFERS2 for
              the resource tagging description.

              Available since 5.13.

       IORING_UNREGISTER_FILES
              This operation requires  no  argument,  and  arg  must  be  passed  as  NULL.   All
              previously   registered  files  associated  with  the  io_uring  instance  will  be
              unregistered. Available since 5.1.

       IORING_REGISTER_EVENTFD
              It's possible to use eventfd(2) to get notified of completion events on an io_uring
              instance.  If this is desired, an eventfd file descriptor can be registered through
              this operation.  arg must contain a pointer to the  eventfd  file  descriptor,  and
              nr_args  must be 1. Note that while io_uring generally takes care to avoid spurious
              events, they can occur. Similarly, batched completions of CQEs may only  trigger  a
              single  eventfd  notification  even  if  multiple  CQEs are posted. The application
              should make no assumptions on number of events  being  available  having  a  direct
              correlation to eventfd notifications posted. An eventfd notification must thus only
              be treated as a hint to check the CQ ring for completions. Available since 5.2.

              An application can temporarily disable notifications, coming through the registered
              eventfd, by setting the IORING_CQ_EVENTFD_DISABLED bit in the flags field of the CQ
              ring.  Available since 5.8.

       IORING_REGISTER_EVENTFD_ASYNC
              This works just like IORING_REGISTER_EVENTFD , except notifications are only posted
              for  events  that complete in an async manner. This means that events that complete
              inline while being submitted do not trigger a  notification  event.  The  arguments
              supplied are the same as for IORING_REGISTER_EVENTFD.  Available since 5.6.

       IORING_UNREGISTER_EVENTFD
              Unregister an eventfd file descriptor to stop notifications. Since only one eventfd
              descriptor is currently supported, this operation takes no argument, and  arg  must
              be passed as NULL and nr_args must be zero. Available since 5.2.

       IORING_REGISTER_PROBE
              This  operation  returns  a  structure,  io_uring_probe, which contains information
              about the opcodes supported by io_uring on the running kernel.  arg must contain  a
              pointer  to  a  struct io_uring_probe, and nr_args must contain the size of the ops
              array in that probe struct. The ops array is of the type  io_uring_probe_op,  which
              holds  the  value  of  the  opcode  and  a  flags  field.  If  the  flags field has
              IO_URING_OP_SUPPORTED set, then this opcode is supported  on  the  running  kernel.
              Available since 5.6.

       IORING_REGISTER_PERSONALITY
              This  operation registers credentials of the running application with io_uring, and
              returns an id associated with these credentials. Applications wishing  to  share  a
              ring  between  separate  users/processes  can pass in this credential id in the sqe
              personality  field.  If  set,  that  particular  sqe  will  be  issued  with  these
              credentials.  Must  be  invoked  with  arg  set  to  NULL  and nr_args set to zero.
              Available since 5.6.

       IORING_UNREGISTER_PERSONALITY
              This operation unregisters  a  previously  registered  personality  with  io_uring.
              nr_args  must  be set to the id in question, and arg must be set to NULL. Available
              since 5.6.

       IORING_REGISTER_ENABLE_RINGS
              This  operation  enables  an  io_uring   ring   started   in   a   disabled   state
              (IORING_SETUP_R_DISABLED  was  specified  in the call to io_uring_setup(2)).  While
              the io_uring ring is disabled, submissions are not allowed  and  registrations  are
              not restricted.

              After  the  execution  of this operation, the io_uring ring is enabled: submissions
              and registration are allowed, but they will be validated following  the  registered
              restrictions  (if any).  This operation takes no argument, must be invoked with arg
              set to NULL and nr_args set to zero. Available since 5.10.

       IORING_REGISTER_RESTRICTIONS
              arg points to a struct io_uring_restriction array of nr_args entries.

              With an entry it is possible to allow an io_uring_register(2)  opcode,  or  specify
              which  opcode  and  flags  of  the  submission  queue entry are allowed, or require
              certain flags to be specified (these flags must be set  on  each  submission  queue
              entry).

              All  the restrictions must be submitted with a single io_uring_register(2) call and
              they are handled as an  allowlist  (opcodes  and  flags  not  registered,  are  not
              allowed).

              Restrictions  can  be  registered  only  if the io_uring ring started in a disabled
              state (IORING_SETUP_R_DISABLED must be specified in the call to io_uring_setup(2)).

              Available since 5.10.

       IORING_REGISTER_IOWQ_AFF
              By default, async workers created by io_uring will inherit  the  CPU  mask  of  its
              parent.  This is usually all the CPUs in the system, unless the parent is being run
              with a limited set.  If  this  isn't  the  desired  outcome,  the  application  may
              explicitly tell io_uring what CPUs the async workers may run on.  arg must point to
              a cpu_set_t mask, and nr_args the byte size of that mask.

              Available since 5.14.

       IORING_UNREGISTER_IOWQ_AFF
              Undoes a CPU mask previously set with IORING_REGISTER_IOWQ_AFF.  Must not have  arg
              or nr_args set.

              Available since 5.14.

       IORING_REGISTER_IOWQ_MAX_WORKERS
              By  default, io_uring limits the unbounded workers created to the maximum processor
              count set by RLIMIT_NPROC and the bounded workers is a function of the SQ ring size
              and  the  number  of  CPUs  in  the system. Sometimes this can be excessive (or too
              little, for bounded), and this command provides a way to change the count per  ring
              (per NUMA node) instead.

              arg  must  be  set  to  an unsigned int pointer to an array of two values, with the
              values in the array being set to the maximum count of workers per NUMA node.  Index
              0  holds the bounded worker count, and index 1 holds the unbounded worker count. On
              successful return, the passed in array will contain the previous maximum valyes for
              each type. If the count being passed in is 0, then this command returns the current
              maximum values and doesn't modify the current setting.  nr_args must be set  to  2,
              as the command takes two values.

              Available since 5.15.

       IORING_REGISTER_RING_FDS
              Whenever io_uring_enter(2) is called to submit request or wait for completions, the
              kernel must grab a reference to the  file  descriptor.  If  the  application  using
              io_uring  is  threaded,  the file table is marked as shared, and the reference grab
              and put of the file descriptor count is more  expensive  than  it  is  for  a  non-
              threaded application.

              Similarly  to how io_uring allows registration of files, this allow registration of
              the ring file descriptor itself. This reduces the overhead of the io_uring_enter(2)
              system call.

              arg  must  be  set  to  an  unsigned  int  pointer  to  an  array  of  type  struct
              io_uring_rsrc_register of nr_args number of entries. The data field of this  struct
              must point to an io_uring file descriptor, and the offset field can be either -1 or
              an explicit offset desired for the registered file descriptor value. If -1 is used,
              then  upon  successful return of this system call, the field will contain the value
              of the registered file descriptor to be used for  future  io_uring_enter(2)  system
              calls.

              On  successful  completion  of  this  request, the returned descriptors may be used
              instead  of  the  real  file  descriptor  for  io_uring_enter(2),   provided   that
              IORING_ENTER_REGISTERED_RING  is  set  in  the flags for the system call. This flag
              tells the kernel that a registered descriptor is  used  rather  than  a  real  file
              descriptor.

              Each  thread  or process using a ring must register the file descriptor directly by
              issuing this request.

              The maximum number of supported registered ring descriptors is currently limited to
              16.

              Available since 5.18.

       IORING_UNREGISTER_RING_FDS
              Unregister descriptors previously registered with IORING_REGISTER_RING_FDS.

              arg  must  be  set  to  an  unsigned  int  pointer  to  an  array  of  type  struct
              io_uring_rsrc_register of nr_args number of entries. Only the offset  field  should
              be  set  in  the  structure,  containing  the  registered  file  descriptor  offset
              previously returned from IORING_REGISTER_RING_FDS that the  application  wishes  to
              unregister.

              Note  that  this  isn't done automatically on ring exit, if the thread or task that
              previously registered a ring file descriptor isn't exiting. It  is  recommended  to
              manually  unregister  any  previously  registered  ring  descriptors if the ring is
              closed and the task persists. This will free up  a  registration  slot,  making  it
              available for future use.

              Available since 5.18.

       IORING_REGISTER_PBUF_RING
              Registers  a  shared  buffer ring to be used with provided buffers. This is a newer
              alternative to using IORING_OP_PROVIDE_BUFFERS which is more efficient, to be  used
              with request types that support the IOSQE_BUFFER_SELECT flag.

              The  arg  argument  must be filled in with the appropriate information. It looks as
              follows:

                   struct io_uring_buf_reg {
                       __u64 ring_addr;
                       __u32 ring_entries;
                       __u16 bgid;
                       __u16 pad;
                       __u64 resv[3];
                   };

               The ring_addr field must contain the address to the memory allocated to  fit  this
               ring.   The memory must be page aligned and hence allocated appropriately using eg
               posix_memalign(3) or similar. The size of the ring is the product of  ring_entries
               and  the  size  of  struct  io_uring_buf.  ring_entries is the desired size of the
               ring, and must be a power-of-2 in size. The maximum size allowed is 2^15  (32768).
               bgid  is  the buffer group ID associated with this ring. SQEs that select a buffer
               have a buffer group associated  with  them  in  their  buf_group  field,  and  the
               associated  CQEs  will  have  IORING_CQE_F_BUFFER set in their flags member, which
               will also contain the specific ID of the buffer selected. The rest of  the  fields
               are reserved and must be cleared to zero.

               nr_args must be set to 1.

               Also see io_uring_register_buf_ring(3) for more details. Available since 5.19.

       IORING_UNREGISTER_PBUF_RING
              Unregister  a  previously  registered provided buffer ring.  arg must be set to the
              address of a struct io_uring_buf_reg, with just the bgid field set  to  the  buffer
              group  ID  of the previously registered provided buffer group.  nr_args must be set
              to 1. Also see IORING_REGISTER_PBUF_RING .

              Available since 5.19.

       IORING_REGISTER_SYNC_CANCEL
              Performs a synchronous cancelation request, which works in  a  similar  fashion  to
              IORING_OP_ASYNC_CANCEL except it completes inline. This can be useful for scenarios
              where cancelations should happen synchronously, rather than needing to issue an SQE
              and wait for completion of that specific CQE.

              arg  must  be set to a pointer to a struct io_uring_sync_cancel_reg structure, with
              the  details  filled  in  for  what  request(s)  to  target  for  cancelation.  See
              io_uring_register_sync_cancel(3)  for  details  on  that. The return values are the
              same, except they are passed back synchronously rather than  through  the  CQE  res
              field.  nr_args must be set to 1.

              Available since 6.0.

       IORING_REGISTER_FILE_ALLOC_RANGE
              sets  the  allowable range for fixed file index allocations within the kernel. When
              requests   that   can   instantiate   a   new   fixed   file    are    used    with
              IORING_FILE_INDEX_ALLOC  ,  the  application is asking the kernel to allocate a new
              fixed file descriptor rather than pass in a specific value for one. By default, the
              kernel  will  pick  any available fixed file descriptor within the range available.
              This effectively allows the application to set  aside  a  range  just  for  dynamic
              allocations, with the remainder being used for specific values.

              nr_args  must  be  set  to  1  and  arg  must  be  set  to  a  pointer  to a struct
              io_uring_file_index_range:

                   struct io_uring_file_index_range {
                       __u32 off;
                       __u32 len;
                       __u64 resv;
                   };

               with off being set to the starting value for the range, and len being set  to  the
               number of descriptors. The reserved resv field must be cleared to zero.

               The application must have registered a file table first.

               Available since 6.0.

RETURN VALUE

       On  success,  io_uring_register(2)  returns either 0 or a positive value, depending on the
       opcode used.  On error, a negative error value is returned. The caller should not rely  on
       the errno variable.

ERRORS

       EACCES The opcode field is not allowed due to registered restrictions.

       EBADF  One or more fds in the fd array are invalid.

       EBADFD IORING_REGISTER_ENABLE_RINGS or IORING_REGISTER_RESTRICTIONS was specified, but the
              io_uring ring is not disabled.

       EBUSY  IORING_REGISTER_BUFFERS or  IORING_REGISTER_FILES  or  IORING_REGISTER_RESTRICTIONS
              was specified, but there were already buffers, files, or restrictions registered.

       EFAULT buffer  is  outside of the process' accessible address space, or iov_len is greater
              than 1GiB.

       EINVAL IORING_REGISTER_BUFFERS or IORING_REGISTER_FILES was specified, but nr_args is 0.

       EINVAL IORING_REGISTER_BUFFERS was specified, but nr_args exceeds UIO_MAXIOV

       EINVAL IORING_UNREGISTER_BUFFERS or IORING_UNREGISTER_FILES was specified, and nr_args  is
              non-zero or arg is non-NULL.

       EINVAL IORING_REGISTER_RESTRICTIONS was specified, but nr_args exceeds the maximum allowed
              number of restrictions or restriction opcode is invalid.

       EMFILE IORING_REGISTER_FILES was specified and nr_args exceeds the maximum allowed  number
              of files in a fixed file set.

       EMFILE IORING_REGISTER_FILES was specified and adding nr_args file references would exceed
              the maximum allowed number of files the user is allowed to have  according  to  the
              RLIMIT_NOFILE  resource  limit  and  the  caller  does  not  have  CAP_SYS_RESOURCE
              capability. Note that this is a per user limit, not per process.

       ENOMEM Insufficient  kernel  resources  are  available,  or  the  caller  had  a  non-zero
              RLIMIT_MEMLOCK  soft  resource  limit, but tried to lock more memory than the limit
              permitted.  This limit is not enforced if the process is privileged (CAP_IPC_LOCK).

       ENXIO  IORING_UNREGISTER_BUFFERS or IORING_UNREGISTER_FILES was specified, but there  were
              no buffers or files registered.

       ENXIO  Attempt  to  register  files  or  buffers  on  an io_uring instance that is already
              undergoing file or buffer registration, or is being torn down.

       EOPNOTSUPP
              User buffers point to file-backed memory.

       EFAULT User buffers point to file-backed memory (newer kernels).