Ubuntu Manpage: ioctl_xfs_exchange_range - exchange the contents of parts of two files

name
synopsis
description
return value
errors
conforming to
use cases
notes
see also

plucky (2) ioctl_xfs_exchange_range.2.gz

Provided by: xfslibs-dev_6.12.0-1ubuntu1_amd64

NAME

       ioctl_xfs_exchange_range - exchange the contents of parts of two files

SYNOPSIS

       #include <sys/ioctl.h>
       #include <xfs/xfs_fs.h>

       int ioctl(int file2_fd, XFS_IOC_EXCHANGE_RANGE, struct xfs_exchange_range *arg);

DESCRIPTION

Given a range of bytes in a first file file1_fd and a second range of bytes in a second file file2_fd,
this ioctl(2) exchanges the contents of the two ranges.

Exchanges are atomic with regards to concurrent file operations. Implementations must guarantee that
readers see either the old contents or the new contents in their entirety, even if the system fails.

The system call parameters are conveyed in structures of the following form:

struct xfs_exchange_range {
__s32 file1_fd;
__u32 pad;
__u64 file1_offset;
__u64 file2_offset;
__u64 length;
__u64 flags;
};

The field pad must be zero.

The fields file1_fd, file1_offset, and length define the first range of bytes to be exchanged.

The fields file2_fd, file2_offset, and length define the second range of bytes to be exchanged.

Both files must be from the same filesystem mount. If the two file descriptors represent the same file,
the byte ranges must not overlap. Most disk-based filesystems require that the starts of both ranges
must be aligned to the file block size. If this is the case, the ends of the ranges must also be so
aligned unless the XFS_EXCHANGE_RANGE_TO_EOF flag is set.

The field flags control the behavior of the exchange operation.

XFS_EXCHANGE_RANGE_TO_EOF
Ignore the length parameter. All bytes in file1_fd from file1_offset to EOF are moved to
file2_fd, and file2's size is set to (file2_offset+(file1_length-file1_offset)). Meanwhile,
all bytes in file2 from file2_offset to EOF are moved to file1 and file1's size is set to
(file1_offset+(file2_length-file2_offset)).

XFS_EXCHANGE_RANGE_DSYNC
Ensure that all modified in-core data in both file ranges and all metadata updates pertaining
to the exchange operation are flushed to persistent storage before the call returns. Opening
either file descriptor with O_SYNC or O_DSYNC will have the same effect.

XFS_EXCHANGE_RANGE_FILE1_WRITTEN
Only exchange sub-ranges of file1_fd that are known to contain data written by application
software. Each sub-range may be expanded (both upwards and downwards) to align with the file
allocation unit. For files on the data device, this is one filesystem block. For files on
the realtime device, this is the realtime extent size. This facility can be used to implement
fast atomic scatter-gather writes of any complexity for software-defined storage targets if
all writes are aligned to the file allocation unit.

XFS_EXCHANGE_RANGE_DRY_RUN
Check the parameters and the feasibility of the operation, but do not change anything.

RETURN VALUE

       On error, -1 is returned, and errno is set to indicate the error.

ERRORS

       Error codes can be one of, but are not limited to, the following:

       EBADF  file1_fd is not open for reading and writing or is open for append-only writes; or file2_fd is not
              open for reading and writing or is open for append-only writes.

       EINVAL The  parameters  are  not  correct  for  these  files.   This error can also appear if either file
              descriptor represents a device, FIFO, or socket.  Disk filesystems generally  require  the  offset
              and length arguments to be aligned to the fundamental block sizes of both files.

       EIO    An I/O error occurred.

       EISDIR One of the files is a directory.

       ENOMEM The kernel was unable to allocate sufficient memory to perform the operation.

       ENOSPC There is not enough free space in the filesystem exchange the contents safely.

       EOPNOTSUPP
              The filesystem does not support exchanging bytes between the two files.

       EPERM  file1_fd or file2_fd are immutable.

       ETXTBSY
              One of the files is a swap file.

       EUCLEAN
              The filesystem is corrupt.

       EXDEV  file1_fd and file2_fd are not on the same mounted filesystem.

CONFORMING TO

       This API is XFS-specific.

USE CASES

       Several  use cases are imagined for this system call.  In all cases, application software must coordinate
       updates to the file because the exchange is performed unconditionally.

       The first is a data storage program that wants to commit non-contiguous updates to a file atomically  and
       coordinates write access to that file.  This can be done by creating a temporary file, calling FICLONE(2)
       to share the contents, and staging  the  updates  into  the  temporary  file.   The  FULL_FILES  flag  is
       recommended for this purpose.  The temporary file can be deleted or punched out afterwards.

       An example program might look like this:

           int fd = open("/some/file", O_RDWR);
           int temp_fd = open("/some", O_TMPFILE | O_RDWR);

           ioctl(temp_fd, FICLONE, fd);

           /* append 1MB of records */
           lseek(temp_fd, 0, SEEK_END);
           write(temp_fd, data1, 1000000);

           /* update record index */
           pwrite(temp_fd, data1, 600, 98765);
           pwrite(temp_fd, data2, 320, 54321);
           pwrite(temp_fd, data2, 15, 0);

           /* commit the entire update */
           struct xfs_exchange_range args = {
               .file1_fd = temp_fd,
               .flags = XFS_EXCHANGE_RANGE_TO_EOF,
           };

           ioctl(fd, XFS_IOC_EXCHANGE_RANGE, &args);

       The  second  is a software-defined storage host (e.g. a disk jukebox) which implements an atomic scatter-
       gather write command.  Provided the exported disk's logical block size matches the file's allocation unit
       size,  this can be done by creating a temporary file and writing the data at the appropriate offsets.  It
       is recommended that the temporary file be truncated to the size of the regular file before any writes are
       staged  to  the temporary file to avoid issues with zeroing during EOF extension.  Use this call with the
       FILE1_WRITTEN flag to exchange only the file allocation units involved in  the  emulated  device's  write
       command.   The  temporary file should be truncated or punched out completely before being reused to stage
       another write.

       An example program might look like this:

           int fd = open("/some/file", O_RDWR);
           int temp_fd = open("/some", O_TMPFILE | O_RDWR);
           struct stat sb;
           int blksz;

           fstat(fd, &sb);
           blksz = sb.st_blksize;

           /* land scatter gather writes between 100fsb and 500fsb */
           pwrite(temp_fd, data1, blksz * 2, blksz * 100);
           pwrite(temp_fd, data2, blksz * 20, blksz * 480);
           pwrite(temp_fd, data3, blksz * 7, blksz * 257);

           /* commit the entire update */
           struct xfs_exchange_range args = {
               .file1_fd = temp_fd,
               .file1_offset = blksz * 100,
               .file2_offset = blksz * 100,
               .length       = blksz * 400,
               .flags        = XFS_EXCHANGE_RANGE_FILE1_WRITTEN |
                               XFS_EXCHANGE_RANGE_FILE1_DSYNC,
           };

           ioctl(fd, XFS_IOC_EXCHANGE_RANGE, &args);

NOTES

       Some filesystems may limit the amount of data or the number of extents that can be exchanged in a  single
       call.