Provided by: nvidia-cuda-dev_10.1.243-3_amd64 bug

NAME

       Stream memory operations -

   Functions
       CUresult cuStreamBatchMemOp (CUstream stream, unsigned int count, CUstreamBatchMemOpParams
           *paramArray, unsigned int flags)
           Batch operations to synchronize the stream via memory operations.
       CUresult cuStreamWaitValue32 (CUstream stream, CUdeviceptr addr, cuuint32_t value,
           unsigned int flags)
           Wait on a memory location.
       CUresult cuStreamWaitValue64 (CUstream stream, CUdeviceptr addr, cuuint64_t value,
           unsigned int flags)
           Wait on a memory location.
       CUresult cuStreamWriteValue32 (CUstream stream, CUdeviceptr addr, cuuint32_t value,
           unsigned int flags)
           Write a value to memory.
       CUresult cuStreamWriteValue64 (CUstream stream, CUdeviceptr addr, cuuint64_t value,
           unsigned int flags)
           Write a value to memory.

Detailed Description

       \brief Stream memory operations of the low-level CUDA driver API (cuda.h)

       This section describes the stream memory operations of the low-level CUDA driver
       application programming interface.

       The whole set of operations is disabled by default. Users are required to explicitly
       enable them, e.g. on Linux by passing the kernel module parameter shown below: modprobe
       nvidia NVreg_EnableStreamMemOPs=1 There is currently no way to enable these operations on
       other operating systems.

       Users can programmatically query whether the device supports these operations with
       cuDeviceGetAttribute() and CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS.

       Support for the CU_STREAM_WAIT_VALUE_NOR flag can be queried with
       CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_WAIT_VALUE_NOR.

       Support for the cuStreamWriteValue64() and cuStreamWaitValue64() functions, as well as for
       the CU_STREAM_MEM_OP_WAIT_VALUE_64 and CU_STREAM_MEM_OP_WRITE_VALUE_64 flags, can be
       queried with CU_DEVICE_ATTRIBUTE_CAN_USE_64_BIT_STREAM_MEM_OPS.

       Support for both CU_STREAM_WAIT_VALUE_FLUSH and CU_STREAM_MEM_OP_FLUSH_REMOTE_WRITES
       requires dedicated platform hardware features and can be queried with
       cuDeviceGetAttribute() and CU_DEVICE_ATTRIBUTE_CAN_FLUSH_REMOTE_WRITES.

       Note that all memory pointers passed as parameters to these operations are device
       pointers. Where necessary a device pointer should be obtained, for example with
       cuMemHostGetDevicePointer().

       None of the operations accepts pointers to managed memory buffers (cuMemAllocManaged).

Function Documentation

   CUresult cuStreamBatchMemOp (CUstream stream, unsigned int count, CUstreamBatchMemOpParams *
       paramArray, unsigned int flags)
       This is a batch version of cuStreamWaitValue32() and cuStreamWriteValue32(). Batching
       operations may avoid some performance overhead in both the API call and the device
       execution versus adding them to the stream in separate API calls. The operations are
       enqueued in the order they appear in the array.

       See CUstreamBatchMemOpType for the full set of supported operations, and
       cuStreamWaitValue32(), cuStreamWaitValue64(), cuStreamWriteValue32(), and
       cuStreamWriteValue64() for details of specific operations.

       Basic support for this can be queried with cuDeviceGetAttribute() and
       CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS. See related APIs for details on querying
       support for specific operations.

       Parameters:
           stream The stream to enqueue the operations in.
           count The number of operations in the array. Must be less than 256.
           paramArray The types and parameters of the individual operations.
           flags Reserved for future expansion; must be 0.

       Returns:
           CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_SUPPORTED

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

       See also:
           cuStreamWaitValue32, cuStreamWaitValue64, cuStreamWriteValue32, cuStreamWriteValue64,
           cuMemHostRegister

   CUresult cuStreamWaitValue32 (CUstream stream, CUdeviceptr addr, cuuint32_t value, unsigned
       int flags)
       Enqueues a synchronization of the stream on the given memory location. Work ordered after
       the operation will block until the given condition on the memory is satisfied. By default,
       the condition is to wait for (int32_t)(*addr - value) >= 0, a cyclic greater-or-equal.
       Other condition types can be specified via flags.

       If the memory was registered via cuMemHostRegister(), the device pointer should be
       obtained with cuMemHostGetDevicePointer(). This function cannot be used with managed
       memory (cuMemAllocManaged).

       Support for this can be queried with cuDeviceGetAttribute() and
       CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS.

       Support for CU_STREAM_WAIT_VALUE_NOR can be queried with cuDeviceGetAttribute() and
       CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_WAIT_VALUE_NOR.

       Parameters:
           stream The stream to synchronize on the memory location.
           addr The memory location to wait on.
           value The value to compare with the memory location.
           flags See CUstreamWaitValue_flags.

       Returns:
           CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_SUPPORTED

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

       See also:
           cuStreamWaitValue64, cuStreamWriteValue32, cuStreamWriteValue64 cuStreamBatchMemOp,
           cuMemHostRegister, cuStreamWaitEvent

   CUresult cuStreamWaitValue64 (CUstream stream, CUdeviceptr addr, cuuint64_t value, unsigned
       int flags)
       Enqueues a synchronization of the stream on the given memory location. Work ordered after
       the operation will block until the given condition on the memory is satisfied. By default,
       the condition is to wait for (int64_t)(*addr - value) >= 0, a cyclic greater-or-equal.
       Other condition types can be specified via flags.

       If the memory was registered via cuMemHostRegister(), the device pointer should be
       obtained with cuMemHostGetDevicePointer().

       Support for this can be queried with cuDeviceGetAttribute() and
       CU_DEVICE_ATTRIBUTE_CAN_USE_64_BIT_STREAM_MEM_OPS.

       Parameters:
           stream The stream to synchronize on the memory location.
           addr The memory location to wait on.
           value The value to compare with the memory location.
           flags See CUstreamWaitValue_flags.

       Returns:
           CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_SUPPORTED

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

       See also:
           cuStreamWaitValue32, cuStreamWriteValue32, cuStreamWriteValue64, cuStreamBatchMemOp,
           cuMemHostRegister, cuStreamWaitEvent

   CUresult cuStreamWriteValue32 (CUstream stream, CUdeviceptr addr, cuuint32_t value, unsigned
       int flags)
       Write a value to memory. Unless the CU_STREAM_WRITE_VALUE_NO_MEMORY_BARRIER flag is
       passed, the write is preceded by a system-wide memory fence, equivalent to a
       __threadfence_system() but scoped to the stream rather than a CUDA thread.

       If the memory was registered via cuMemHostRegister(), the device pointer should be
       obtained with cuMemHostGetDevicePointer(). This function cannot be used with managed
       memory (cuMemAllocManaged).

       Support for this can be queried with cuDeviceGetAttribute() and
       CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS.

       Parameters:
           stream The stream to do the write in.
           addr The device address to write to.
           value The value to write.
           flags See CUstreamWriteValue_flags.

       Returns:
           CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_SUPPORTED

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

       See also:
           cuStreamWriteValue64, cuStreamWaitValue32, cuStreamWaitValue64, cuStreamBatchMemOp,
           cuMemHostRegister, cuEventRecord

   CUresult cuStreamWriteValue64 (CUstream stream, CUdeviceptr addr, cuuint64_t value, unsigned
       int flags)
       Write a value to memory. Unless the CU_STREAM_WRITE_VALUE_NO_MEMORY_BARRIER flag is
       passed, the write is preceded by a system-wide memory fence, equivalent to a
       __threadfence_system() but scoped to the stream rather than a CUDA thread.

       If the memory was registered via cuMemHostRegister(), the device pointer should be
       obtained with cuMemHostGetDevicePointer().

       Support for this can be queried with cuDeviceGetAttribute() and
       CU_DEVICE_ATTRIBUTE_CAN_USE_64_BIT_STREAM_MEM_OPS.

       Parameters:
           stream The stream to do the write in.
           addr The device address to write to.
           value The value to write.
           flags See CUstreamWriteValue_flags.

       Returns:
           CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_NOT_SUPPORTED

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

       See also:
           cuStreamWriteValue32, cuStreamWaitValue32, cuStreamWaitValue64, cuStreamBatchMemOp,
           cuMemHostRegister, cuEventRecord

Author

       Generated automatically by Doxygen from the source code.