Ubuntu Manpage: Stream Management -

Provided by: nvidia-cuda-dev_9.1.85-3ubuntu1_amd64

NAME

       Stream Management -

   Typedefs
       typedef void(CUDART_CB * cudaStreamCallback_t )(cudaStream_t stream, cudaError_t status,
           void *userData)

   Functions
       cudaError_t cudaStreamAddCallback (cudaStream_t stream, cudaStreamCallback_t callback,
           void *userData, unsigned int flags)
           Add a callback to a compute stream.
       __cudart_builtin__ cudaError_t cudaStreamAttachMemAsync (cudaStream_t stream, void
           *devPtr, size_t length=0, unsigned int flags=0x04)
           Attach memory to a stream asynchronously.
       cudaError_t cudaStreamCreate (cudaStream_t *pStream)
           Create an asynchronous stream.
       __cudart_builtin__ cudaError_t cudaStreamCreateWithFlags (cudaStream_t *pStream, unsigned
           int flags)
           Create an asynchronous stream.
       __cudart_builtin__ cudaError_t cudaStreamCreateWithPriority (cudaStream_t *pStream,
           unsigned int flags, int priority)
           Create an asynchronous stream with the specified priority.
       __cudart_builtin__ cudaError_t cudaStreamDestroy (cudaStream_t stream)
           Destroys and cleans up an asynchronous stream.
       __cudart_builtin__ cudaError_t cudaStreamGetFlags (cudaStream_t hStream, unsigned int
           *flags)
           Query the flags of a stream.
       __cudart_builtin__ cudaError_t cudaStreamGetPriority (cudaStream_t hStream, int *priority)
           Query the priority of a stream.
       cudaError_t cudaStreamQuery (cudaStream_t stream)
           Queries an asynchronous stream for completion status.
       cudaError_t cudaStreamSynchronize (cudaStream_t stream)
           Waits for stream tasks to complete.
       __cudart_builtin__ cudaError_t cudaStreamWaitEvent (cudaStream_t stream, cudaEvent_t
           event, unsigned int flags)
           Make a compute stream wait on an event.

Detailed Description

       \brief stream management functions of the CUDA runtime API (cuda_runtime_api.h)

       This section describes the stream management functions of the CUDA runtime application
       programming interface.

Typedef Documentation

   typedef void(CUDART_CB * cudaStreamCallback_t)(cudaStream_t stream, cudaError_t status, void
       *userData)
       Type of stream callback functions.

       Parameters:
           stream The stream as passed to cudaStreamAddCallback, may be NULL.
           status cudaSuccess or any persistent error on the stream.
           userData User parameter provided at registration.

Function Documentation

   cudaError_t cudaStreamAddCallback (cudaStream_t stream, cudaStreamCallback_t callback, void *
       userData, unsigned int flags)
       Adds a callback to be called on the host after all currently enqueued items in the stream
       have completed. For each cudaStreamAddCallback call, a callback will be executed exactly
       once. The callback will block later work in the stream until it is finished.

       The callback may be passed cudaSuccess or an error code. In the event of a device error,
       all subsequently executed callbacks will receive an appropriate cudaError_t.

       Callbacks must not make any CUDA API calls. Attempting to use CUDA APIs will result in
       cudaErrorNotPermitted. Callbacks must not perform any synchronization that may depend on
       outstanding device work or other callbacks that are not mandated to run earlier. Callbacks
       without a mandated order (in independent streams) execute in undefined order and may be
       serialized.

       For the purposes of Unified Memory, callback execution makes a number of guarantees:

       • The callback stream is considered idle for the duration of the callback. Thus, for
         example, a callback may always use memory attached to the callback stream.
       • The start of execution of a callback has the same effect as synchronizing an event
         recorded in the same stream immediately prior to the callback. It thus synchronizes
         streams which have been 'joined' prior to the callback.
       • Adding device work to any stream does not have the effect of making the stream active
         until all preceding callbacks have executed. Thus, for example, a callback might use
         global attached memory even if work has been added to another stream, if it has been
         properly ordered with an event.
       • Completion of a callback does not cause a stream to become active except as described
         above. The callback stream will remain idle if no device work follows the callback, and
         will remain idle across consecutive callbacks without device work in between. Thus, for
         example, stream synchronization can be done by signaling from a callback at the end of
         the stream.
       Parameters:
           stream - Stream to add callback to
           callback - The function to call once preceding stream operations are complete
           userData - User specified data to be passed to the callback function
           flags - Reserved for future use, must be 0
       Returns:
           cudaSuccess, cudaErrorInvalidResourceHandle, cudaErrorNotSupported
       Note:
           This function uses standard  semantics.
           Note that this function may also return error codes from previous, asynchronous
           launches.
       See also:
           cudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamQuery, cudaStreamSynchronize,
           cudaStreamWaitEvent, cudaStreamDestroy, cudaMallocManaged, cudaStreamAttachMemAsync,
           cuStreamAddCallback
   __cudart_builtin__ cudaError_t cudaStreamAttachMemAsync (cudaStream_t stream, void * devPtr,
       size_t length = 0, unsigned int flags = 0x04)
       Enqueues an operation in stream to specify stream association of length bytes of memory
       starting from devPtr. This function is a stream-ordered operation, meaning that it is
       dependent on, and will only take effect when, previous work in stream has completed. Any
       previous association is automatically replaced.
       devPtr must point to an address within managed memory space declared using the __managed__
       keyword or allocated with cudaMallocManaged.
       length must be zero, to indicate that the entire allocation's stream association is being
       changed. Currently, it's not possible to change stream association for a portion of an
       allocation. The default value for length is zero.
       The stream association is specified using flags which must be one of cudaMemAttachGlobal,
       cudaMemAttachHost or cudaMemAttachSingle. The default value for flags is
       cudaMemAttachSingle If the cudaMemAttachGlobal flag is specified, the memory can be
       accessed by any stream on any device. If the cudaMemAttachHost flag is specified, the
       program makes a guarantee that it won't access the memory on the device from any stream on
       a device that has a zero value for the device attribute
       cudaDevAttrConcurrentManagedAccess. If the cudaMemAttachSingle flag is specified and
       stream is associated with a device that has a zero value for the device attribute
       cudaDevAttrConcurrentManagedAccess, the program makes a guarantee that it will only access
       the memory on the device from stream. It is illegal to attach singly to the NULL stream,
       because the NULL stream is a virtual global stream and not a specific stream. An error
       will be returned in this case.
       When memory is associated with a single stream, the Unified Memory system will allow CPU
       access to this memory region so long as all operations in stream have completed,
       regardless of whether other streams are active. In effect, this constrains exclusive
       ownership of the managed memory region by an active GPU to per-stream activity instead of
       whole-GPU activity.
       Accessing memory on the device from streams that are not associated with it will produce
       undefined results. No error checking is performed by the Unified Memory system to ensure
       that kernels launched into other streams do not access this region.
       It is a program's responsibility to order calls to cudaStreamAttachMemAsync via events,
       synchronization or other means to ensure legal access to memory at all times. Data
       visibility and coherency will be changed appropriately for all kernels which follow a
       stream-association change.
       If stream is destroyed while data is associated with it, the association is removed and
       the association reverts to the default visibility of the allocation as specified at
       cudaMallocManaged. For __managed__ variables, the default association is always
       cudaMemAttachGlobal. Note that destroying a stream is an asynchronous operation, and as a
       result, the change to default association won't happen until all work in the stream has
       completed.
       Parameters:
           stream - Stream in which to enqueue the attach operation
           devPtr - Pointer to memory (must be a pointer to managed memory)
           length - Length of memory (must be zero, defaults to zero)
           flags - Must be one of cudaMemAttachGlobal, cudaMemAttachHost or cudaMemAttachSingle
           (defaults to cudaMemAttachSingle)
       Returns:
           cudaSuccess, cudaErrorNotReady, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle
       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.
       See also:
           cudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamWaitEvent,
           cudaStreamSynchronize, cudaStreamAddCallback, cudaStreamDestroy, cudaMallocManaged,
           cuStreamAttachMemAsync
       Parameters:
           flags Memory can only be accessed by a single stream on the associated device
   cudaError_t cudaStreamCreate (cudaStream_t * pStream)
       Creates a new asynchronous stream.
       Parameters:
           pStream - Pointer to new stream identifier
       Returns:
           cudaSuccess, cudaErrorInvalidValue
       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.
       See also:
           cudaStreamCreateWithPriority, cudaStreamCreateWithFlags, cudaStreamGetPriority,
           cudaStreamGetFlags, cudaStreamQuery, cudaStreamSynchronize, cudaStreamWaitEvent,
           cudaStreamAddCallback, cudaStreamDestroy, cuStreamCreate
   __cudart_builtin__ cudaError_t cudaStreamCreateWithFlags (cudaStream_t * pStream, unsigned int
       flags)
       Creates a new asynchronous stream. The flags argument determines the behaviors of the
       stream. Valid values for flags are
       • cudaStreamDefault: Default stream creation flag.
       • cudaStreamNonBlocking: Specifies that work running in the created stream may run
         concurrently with work in stream 0 (the NULL stream), and that the created stream should
         perform no implicit synchronization with stream 0.
       Parameters:
           pStream - Pointer to new stream identifier
           flags - Parameters for stream creation
       Returns:
           cudaSuccess, cudaErrorInvalidValue
       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.
       See also:
           cudaStreamCreate, cudaStreamCreateWithPriority, cudaStreamGetFlags, cudaStreamQuery,
           cudaStreamSynchronize, cudaStreamWaitEvent, cudaStreamAddCallback, cudaStreamDestroy,
           cuStreamCreate
   __cudart_builtin__ cudaError_t cudaStreamCreateWithPriority (cudaStream_t * pStream, unsigned
       int flags, int priority)
       Creates a stream with the specified priority and returns a handle in pStream. This API
       alters the scheduler priority of work in the stream. Work in a higher priority stream may
       preempt work already executing in a low priority stream.
       priority follows a convention where lower numbers represent higher priorities. '0'
       represents default priority. The range of meaningful numerical priorities can be queried
       using cudaDeviceGetStreamPriorityRange. If the specified priority is outside the numerical
       range returned by cudaDeviceGetStreamPriorityRange, it will automatically be clamped to
       the lowest or the highest number in the range.
       Parameters:
           pStream - Pointer to new stream identifier
           flags - Flags for stream creation. See cudaStreamCreateWithFlags for a list of valid
           flags that can be passed
           priority - Priority of the stream. Lower numbers represent higher priorities. See
           cudaDeviceGetStreamPriorityRange for more information about the meaningful stream
           priorities that can be passed.
       Returns:
           cudaSuccess, cudaErrorInvalidValue
       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.
           Stream priorities are supported only on GPUs with compute capability 3.5 or higher.
           In the current implementation, only compute kernels launched in priority streams are
           affected by the stream's priority. Stream priorities have no effect on host-to-device
           and device-to-host memory operations.
       See also:
           cudaStreamCreate, cudaStreamCreateWithFlags, cudaDeviceGetStreamPriorityRange,
           cudaStreamGetPriority, cudaStreamQuery, cudaStreamWaitEvent, cudaStreamAddCallback,
           cudaStreamSynchronize, cudaStreamDestroy, cuStreamCreateWithPriority
   __cudart_builtin__ cudaError_t cudaStreamDestroy (cudaStream_t stream)
       Destroys and cleans up the asynchronous stream specified by stream.
       In case the device is still doing work in the stream stream when cudaStreamDestroy() is
       called, the function will return immediately and the resources associated with stream will
       be released automatically once the device has completed all work in stream.
       Parameters:
           stream - Stream identifier
       Returns:
           cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle
       Note:
           This function uses standard  semantics.
           Note that this function may also return error codes from previous, asynchronous
           launches.
       See also:
           cudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamQuery, cudaStreamWaitEvent,
           cudaStreamSynchronize, cudaStreamAddCallback, cuStreamDestroy
   __cudart_builtin__ cudaError_t cudaStreamGetFlags (cudaStream_t hStream, unsigned int * flags)
       Query the flags of a stream. The flags are returned in flags. See
       cudaStreamCreateWithFlags for a list of valid flags.
       Parameters:
           hStream - Handle to the stream to be queried
           flags - Pointer to an unsigned integer in which the stream's flags are returned
       Returns:
           cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle
       Note:
           This function uses standard  semantics.
           Note that this function may also return error codes from previous, asynchronous
           launches.
       See also:
           cudaStreamCreateWithPriority, cudaStreamCreateWithFlags, cudaStreamGetPriority,
           cuStreamGetFlags
   __cudart_builtin__ cudaError_t cudaStreamGetPriority (cudaStream_t hStream, int * priority)
       Query the priority of a stream. The priority is returned in in priority. Note that if the
       stream was created with a priority outside the meaningful numerical range returned by
       cudaDeviceGetStreamPriorityRange, this function returns the clamped priority. See
       cudaStreamCreateWithPriority for details about priority clamping.
       Parameters:
           hStream - Handle to the stream to be queried
           priority - Pointer to a signed integer in which the stream's priority is returned
       Returns:
           cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle
       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.
       See also:
           cudaStreamCreateWithPriority, cudaDeviceGetStreamPriorityRange, cudaStreamGetFlags,
           cuStreamGetPriority
   cudaError_t cudaStreamQuery (cudaStream_t stream)
       Returns cudaSuccess if all operations in stream have completed, or cudaErrorNotReady if
       not.
       For the purposes of Unified Memory, a return value of cudaSuccess is equivalent to having
       called cudaStreamSynchronize().
       Parameters:
           stream - Stream identifier
       Returns:
           cudaSuccess, cudaErrorNotReady, cudaErrorInvalidResourceHandle
       Note:
           This function uses standard  semantics.
           Note that this function may also return error codes from previous, asynchronous
           launches.
       See also:
           cudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamWaitEvent,
           cudaStreamSynchronize, cudaStreamAddCallback, cudaStreamDestroy, cuStreamQuery
   cudaError_t cudaStreamSynchronize (cudaStream_t stream)
       Blocks until stream has completed all operations. If the cudaDeviceScheduleBlockingSync
       flag was set for this device, the host thread will block until the stream is finished with
       all of its tasks.
       Parameters:
           stream - Stream identifier
       Returns:
           cudaSuccess, cudaErrorInvalidResourceHandle
       Note:
           This function uses standard  semantics.
           Note that this function may also return error codes from previous, asynchronous
           launches.
       See also:
           cudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamQuery, cudaStreamWaitEvent,
           cudaStreamAddCallback, cudaStreamDestroy, cuStreamSynchronize
   __cudart_builtin__ cudaError_t cudaStreamWaitEvent (cudaStream_t stream, cudaEvent_t event,
       unsigned int flags)
       Makes all future work submitted to stream wait for all work captured in event. See
       cudaEventRecord() for details on what is captured by an event. The synchronization will be
       performed efficiently on the device when applicable. event may be from a different device
       than stream.
       Parameters:
           stream - Stream to wait
           event - Event to wait on
           flags - Parameters for the operation (must be 0)
       Returns:
           cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidResourceHandle
       Note:
           This function uses standard  semantics.
           Note that this function may also return error codes from previous, asynchronous
           launches.
       See also:
           cudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamQuery, cudaStreamSynchronize,
           cudaStreamAddCallback, cudaStreamDestroy, cuStreamWaitEvent

Author

       Generated automatically by Doxygen from the source code.