Provided by: nvidia-cuda-dev_9.1.85-3ubuntu1_amd64 bug

NAME

       Execution Control -

   Functions
       __cudart_builtin__ cudaError_t cudaFuncGetAttributes (struct cudaFuncAttributes *attr,
           const void *func)
           Find out attributes for a given function.
       __cudart_builtin__ cudaError_t cudaFuncSetAttribute (const void *func, enum
           cudaFuncAttribute attr, int value)
           Set attributes for a given function.
       cudaError_t cudaFuncSetCacheConfig (const void *func, enum cudaFuncCache cacheConfig)
           Sets the preferred cache configuration for a device function.
       cudaError_t cudaFuncSetSharedMemConfig (const void *func, enum cudaSharedMemConfig config)
           Sets the shared memory configuration for a device function.
       __device__ __cudart_builtin__ void * cudaGetParameterBuffer (size_t alignment, size_t
           size)
           Obtains a parameter buffer.
       __device__ __cudart_builtin__ void * cudaGetParameterBufferV2 (void *func, dim3
           gridDimension, dim3 blockDimension, unsigned int sharedMemSize)
           Launches a specified kernel.
       cudaError_t cudaLaunchCooperativeKernel (const void *func, dim3 gridDim, dim3 blockDim,
           void **args, size_t sharedMem, cudaStream_t stream)
           Launches a device function where thread blocks can cooperate and synchronize as they
           execute.
       cudaError_t cudaLaunchCooperativeKernelMultiDevice (struct cudaLaunchParams
           *launchParamsList, unsigned int numDevices, unsigned int flags=0)
           Launches device functions on multiple devices where thread blocks can cooperate and
           synchronize as they execute.
       cudaError_t cudaLaunchKernel (const void *func, dim3 gridDim, dim3 blockDim, void **args,
           size_t sharedMem, cudaStream_t stream)
           Launches a device function.
       cudaError_t cudaSetDoubleForDevice (double *d)
           Converts a double argument to be executed on a device.
       cudaError_t cudaSetDoubleForHost (double *d)
           Converts a double argument after execution on a device.

Detailed Description

       \brief execution control functions of the CUDA runtime API (cuda_runtime_api.h)

       This section describes the execution control functions of the CUDA runtime application
       programming interface.

       Some functions have overloaded C++ API template versions documented separately in the C++
       API Routines module.

Function Documentation

   __cudart_builtin__ cudaError_t cudaFuncGetAttributes (struct cudaFuncAttributes * attr, const
       void * func)
       This function obtains the attributes of a function specified via func. func is a device
       function symbol and must be declared as a __global__ function. The fetched attributes are
       placed in attr. If the specified function does not exist, then
       cudaErrorInvalidDeviceFunction is returned. For templated functions, pass the function
       symbol as follows: func_name<template_arg_0,...,template_arg_N>

       Note that some function attributes such as maxThreadsPerBlock may vary based on the device
       that is currently being used.

       Parameters:
           attr - Return pointer to function's attributes
           func - Device function symbol

       Returns:
           cudaSuccess, cudaErrorInitializationError, cudaErrorInvalidDeviceFunction

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

           Use of a string naming a function as the func parameter was deprecated in CUDA 4.1 and
           removed in CUDA 5.0.

       See also:
           cudaConfigureCall, cudaFuncSetCacheConfig (C API), cudaFuncGetAttributes (C++ API),
           cudaLaunchKernel (C API), cudaSetDoubleForDevice, cudaSetDoubleForHost,
           cudaSetupArgument (C API), cuFuncGetAttribute

   __cudart_builtin__ cudaError_t cudaFuncSetAttribute (const void * func, enum cudaFuncAttribute
       attr, int value)
       This function sets the attributes of a function specified via entry. The parameter entry
       must be a pointer to a function that executes on the device. The parameter specified by
       entry must be declared as a __global__ function. The enumeration defined by attr is set to
       the value defined by value If the specified function does not exist, then
       cudaErrorInvalidDeviceFunction is returned. If the specified attribute cannot be written,
       or if the value is incorrect, then cudaErrorInvalidValue is returned.

       Valid values for attr are: cuFuncAttrMaxDynamicSharedMem - Maximum size of dynamic shared
       memory per block cudaFuncAttributePreferredSharedMemoryCarveout - Preferred shared memory-
       L1 cache split ratio

       Parameters:
           entry - Function to get attributes of
           attr - Attribute to set
           value - Value to set

       Returns:
           cudaSuccess, cudaErrorInitializationError, cudaErrorInvalidDeviceFunction,
           cudaErrorInvalidValue

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

       cudaLaunchKernel (C++ API), cudaFuncSetCacheConfig (C++ API), cudaFuncGetAttributes (C
       API), cudaSetDoubleForDevice, cudaSetDoubleForHost, cudaSetupArgument (C++ API)

   cudaError_t cudaFuncSetCacheConfig (const void * func, enum cudaFuncCache cacheConfig)
       On devices where the L1 cache and shared memory use the same hardware resources, this sets
       through cacheConfig the preferred cache configuration for the function specified via func.
       This is only a preference. The runtime will use the requested configuration if possible,
       but it is free to choose a different configuration if required to execute func.

       func is a device function symbol and must be declared as a __global__ function. If the
       specified function does not exist, then cudaErrorInvalidDeviceFunction is returned. For
       templated functions, pass the function symbol as follows:
       func_name<template_arg_0,...,template_arg_N>

       This setting does nothing on devices where the size of the L1 cache and shared memory are
       fixed.

       Launching a kernel with a different preference than the most recent preference setting may
       insert a device-side synchronization point.

       The supported cache configurations are:

       • cudaFuncCachePreferNone: no preference for shared memory or L1 (default)

       • cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache

       • cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory

       • cudaFuncCachePreferEqual: prefer equal size L1 cache and shared memory

       Parameters:
           func - Device function symbol
           cacheConfig - Requested cache configuration

       Returns:
           cudaSuccess, cudaErrorInitializationError, cudaErrorInvalidDeviceFunction

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

           Use of a string naming a function as the func parameter was deprecated in CUDA 4.1 and
           removed in CUDA 5.0.

       See also:
           cudaConfigureCall, cudaFuncSetCacheConfig (C++ API), cudaFuncGetAttributes (C API),
           cudaLaunchKernel (C API), cudaSetDoubleForDevice, cudaSetDoubleForHost,
           cudaSetupArgument (C API), cudaThreadGetCacheConfig, cudaThreadSetCacheConfig,
           cuFuncSetCacheConfig

   cudaError_t cudaFuncSetSharedMemConfig (const void * func, enum cudaSharedMemConfig config)
       On devices with configurable shared memory banks, this function will force all subsequent
       launches of the specified device function to have the given shared memory bank size
       configuration. On any given launch of the function, the shared memory configuration of the
       device will be temporarily changed if needed to suit the function's preferred
       configuration. Changes in shared memory configuration between subsequent launches of
       functions, may introduce a device side synchronization point.

       Any per-function setting of shared memory bank size set via cudaFuncSetSharedMemConfig
       will override the device wide setting set by cudaDeviceSetSharedMemConfig.

       Changing the shared memory bank size will not increase shared memory usage or affect
       occupancy of kernels, but may have major effects on performance. Larger bank sizes will
       allow for greater potential bandwidth to shared memory, but will change what kinds of
       accesses to shared memory will result in bank conflicts.

       This function will do nothing on devices with fixed shared memory bank size.

       For templated functions, pass the function symbol as follows:
       func_name<template_arg_0,...,template_arg_N>

       The supported bank configurations are:

       • cudaSharedMemBankSizeDefault: use the device's shared memory configuration when
         launching this function.

       • cudaSharedMemBankSizeFourByte: set shared memory bank width to be four bytes natively
         when launching this function.

       • cudaSharedMemBankSizeEightByte: set shared memory bank width to be eight bytes natively
         when launching this function.

       Parameters:
           func - Device function symbol
           config - Requested shared memory configuration

       Returns:
           cudaSuccess, cudaErrorInitializationError, cudaErrorInvalidDeviceFunction,
           cudaErrorInvalidValue,

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

           Use of a string naming a function as the func parameter was deprecated in CUDA 4.1 and
           removed in CUDA 5.0.

       See also:
           cudaConfigureCall, cudaDeviceSetSharedMemConfig, cudaDeviceGetSharedMemConfig,
           cudaDeviceSetCacheConfig, cudaDeviceGetCacheConfig, cudaFuncSetCacheConfig,
           cuFuncSetSharedMemConfig

   __device__ __cudart_builtin__ void* cudaGetParameterBuffer (size_t alignment, size_t size)
       Obtains a parameter buffer which can be filled with parameters for a kernel launch.
       Parameters passed to cudaLaunchDevice must be allocated via this function.

       This is a low level API and can only be accessed from Parallel Thread Execution (PTX).
       CUDA user code should use <<< >>> to launch kernels.

       Parameters:
           alignment - Specifies alignment requirement of the parameter buffer
           size - Specifies size requirement in bytes

       Returns:
           Returns pointer to the allocated parameterBuffer

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

       See also:
           cudaLaunchDevice

   __device__ __cudart_builtin__ void* cudaGetParameterBufferV2 (void * func, dim3 gridDimension,
       dim3 blockDimension, unsigned int sharedMemSize)
       Launches a specified kernel with the specified parameter buffer. A parameter buffer can be
       obtained by calling cudaGetParameterBuffer().

       This is a low level API and can only be accessed from Parallel Thread Execution (PTX).
       CUDA user code should use <<< >>> to launch the kernels.

       Parameters:
           func - Pointer to the kernel to be launched
           parameterBuffer - Holds the parameters to the launched kernel. parameterBuffer can be
           NULL. (Optional)
           gridDimension - Specifies grid dimensions
           blockDimension - Specifies block dimensions
           sharedMemSize - Specifies size of shared memory
           stream - Specifies the stream to be used

       Returns:
           cudaSuccess, cudaErrorInvalidDevice, cudaErrorLaunchMaxDepthExceeded,
           cudaErrorInvalidConfiguration, cudaErrorStartupFailure,
           cudaErrorLaunchPendingCountExceeded, cudaErrorLaunchOutOfResources

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.
            Please refer to Execution Configuration and Parameter Buffer Layout from the CUDA
           Programming Guide for the detailed descriptions of launch configuration and parameter
           layout respectively.

       See also:
           cudaGetParameterBuffer

   cudaError_t cudaLaunchCooperativeKernel (const void * func, dim3 gridDim, dim3 blockDim, void
       ** args, size_t sharedMem, cudaStream_t stream)
       The function invokes kernel func on gridDim (gridDim.x × gridDim.y × gridDim.z) grid of
       blocks. Each block contains blockDim (blockDim.x × blockDim.y × blockDim.z) threads.

       The device on which this kernel is invoked must have a non-zero value for the device
       attribute cudaDevAttrCooperativeLaunch.

       The total number of blocks launched cannot exceed the maximum number of blocks per
       multiprocessor as returned by cudaOccupancyMaxActiveBlocksPerMultiprocessor (or
       cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags) times the number of
       multiprocessors as specified by the device attribute cudaDevAttrMultiProcessorCount.

       The kernel cannot make use of CUDA dynamic parallelism.

       If the kernel has N parameters the args should point to array of N pointers. Each pointer,
       from args[0] to args[N - 1], point to the region of memory from which the actual parameter
       will be copied.

       For templated functions, pass the function symbol as follows:
       func_name<template_arg_0,...,template_arg_N>

       sharedMem sets the amount of dynamic shared memory that will be available to each thread
       block.

       stream specifies a stream the invocation is associated to.

       Parameters:
           func - Device function symbol
           gridDim - Grid dimensions
           blockDim - Block dimensions
           args - Arguments
           sharedMem - Shared memory
           stream - Stream identifier

       Returns:
           cudaSuccess, cudaErrorInvalidDeviceFunction, cudaErrorInvalidConfiguration,
           cudaErrorLaunchFailure, cudaErrorLaunchTimeout, cudaErrorLaunchOutOfResources,
           cudaErrorCooperativeLaunchTooLarge, cudaErrorSharedObjectInitFailed

       Note:
           This function uses standard  semantics.

           Note that this function may also return error codes from previous, asynchronous
           launches.

       See also:
           cudaLaunchCooperativeKernel (C++ API), cudaLaunchCooperativeKernelMultiDevice,
           cuLaunchCooperativeKernel

   cudaError_t cudaLaunchCooperativeKernelMultiDevice (struct cudaLaunchParams *
       launchParamsList, unsigned int numDevices, unsigned int flags = 0)
       Invokes kernels as specified in the launchParamsList array where each element of the array
       specifies all the parameters required to perform a single kernel launch. These kernels can
       cooperate and synchronize as they execute. The size of the array is specified by
       numDevices.

       No two kernels can be launched on the same device. All the devices targeted by this multi-
       device launch must be identical. All devices must have a non-zero value for the device
       attribute cudaDevAttrCooperativeLaunch.

       The same kernel must be launched on all devices. Note that any __device__ or __constant__
       variables are independently instantiated on every device. It is the application's
       responsibility to ensure these variables are initialized and used appropriately.

       The size of the grids as specified in blocks, the size of the blocks themselves and the
       amount of shared memory used by each thread block must also match across all launched
       kernels.

       The streams used to launch these kernels must have been created via either
       cudaStreamCreate or cudaStreamCreateWithPriority or cudaStreamCreateWithPriority. The NULL
       stream or cudaStreamLegacy or cudaStreamPerThread cannot be used.

       The total number of blocks launched per kernel cannot exceed the maximum number of blocks
       per multiprocessor as returned by cudaOccupancyMaxActiveBlocksPerMultiprocessor (or
       cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags) times the number of
       multiprocessors as specified by the device attribute cudaDevAttrMultiProcessorCount. Since
       the total number of blocks launched per device has to match across all devices, the
       maximum number of blocks that can be launched per device will be limited by the device
       with the least number of multiprocessors.

       The kernel cannot make use of CUDA dynamic parallelism.

       The cudaLaunchParams structure is defined as:

               struct cudaLaunchParams
               {
                   void *func;
                   dim3 gridDim;
                   dim3 blockDim;
                   void **args;
                   size_t sharedMem;
                   cudaStream_t stream;
               };

        where:

       • cudaLaunchParams::func specifies the kernel to be launched. This same functions must be
         launched on all devices. For templated functions, pass the function symbol as follows:
         func_name<template_arg_0,...,template_arg_N>

       • cudaLaunchParams::gridDim specifies the width, height and depth of the grid in blocks.
         This must match across all kernels launched.

       • cudaLaunchParams::blockDim is the width, height and depth of each thread block. This
         must match across all kernels launched.

       • cudaLaunchParams::args specifies the arguments to the kernel. If the kernel has N
         parameters then cudaLaunchParams::args should point to array of N pointers. Each
         pointer, from cudaLaunchParams::args[0] to cudaLaunchParams::args[N - 1], point to the
         region of memory from which the actual parameter will be copied.

        cudaLaunchParams::sharedMem is the dynamic shared-memory size per thread block in bytes.
         This must match across all kernels launched.

       • cudaLaunchParams::stream is the handle to the stream to perform the launch in. This
         cannot be the NULL stream or cudaStreamLegacy or cudaStreamPerThread.

       By default, the kernel won't begin execution on any GPU until all prior work in all the
       specified streams has completed. This behavior can be overridden by specifying the flag
       cudaCooperativeLaunchMultiDeviceNoPreSync. When this flag is specified, each kernel will
       only wait for prior work in the stream corresponding to that GPU to complete before it
       begins execution.

       Similarly, by default, any subsequent work pushed in any of the specified streams will not
       begin execution until the kernels on all GPUs have completed. This behavior can be
       overridden by specifying the flag cudaCooperativeLaunchMultiDeviceNoPostSync. When this
       flag is specified, any subsequent work pushed in any of the specified streams will only
       wait for the kernel launched on the GPU corresponding to that stream to complete before it
       begins execution.

       Parameters:
           launchParamsList - List of launch parameters, one per device
           numDevices - Size of the launchParamsList array
           flags - Flags to control launch behavior

       Returns:
           cudaSuccess, cudaErrorInvalidDeviceFunction, cudaErrorInvalidConfiguration,
           cudaErrorLaunchFailure, cudaErrorLaunchTimeout, cudaErrorLaunchOutOfResources,
           cudaErrorCooperativeLaunchTooLarge, cudaErrorSharedObjectInitFailed

       Note:
           This function uses standard  semantics.

           Note that this function may also return error codes from previous, asynchronous
           launches.

       See also:
           cudaLaunchCooperativeKernel (C++ API), cudaLaunchCooperativeKernel,
           cuLaunchCooperativeKernelMultiDevice

   cudaError_t cudaLaunchKernel (const void * func, dim3 gridDim, dim3 blockDim, void ** args,
       size_t sharedMem, cudaStream_t stream)
       The function invokes kernel func on gridDim (gridDim.x × gridDim.y × gridDim.z) grid of
       blocks. Each block contains blockDim (blockDim.x × blockDim.y × blockDim.z) threads.

       If the kernel has N parameters the args should point to array of N pointers. Each pointer,
       from args[0] to args[N - 1], point to the region of memory from which the actual parameter
       will be copied.

       For templated functions, pass the function symbol as follows:
       func_name<template_arg_0,...,template_arg_N>

       sharedMem sets the amount of dynamic shared memory that will be available to each thread
       block.

       stream specifies a stream the invocation is associated to.

       Parameters:
           func - Device function symbol
           gridDim - Grid dimensions
           blockDim - Block dimensions
           args - Arguments
           sharedMem - Shared memory
           stream - Stream identifier

       Returns:
           cudaSuccess, cudaErrorInvalidDeviceFunction, cudaErrorInvalidConfiguration,
           cudaErrorLaunchFailure, cudaErrorLaunchTimeout, cudaErrorLaunchOutOfResources,
           cudaErrorSharedObjectInitFailed, cudaErrorInvalidPtx, cudaErrorNoKernelImageForDevice,
           cudaErrorJitCompilerNotFound

       Note:
           This function uses standard  semantics.

           Note that this function may also return error codes from previous, asynchronous
           launches.

       See also:
           cudaLaunchKernel (C++ API), cuLaunchKernel

   cudaError_t cudaSetDoubleForDevice (double * d)
       Parameters:
           d - Double to convert

       Deprecated
           This function is deprecated as of CUDA 7.5

       Converts the double value of d to an internal float representation if the device does not
       support double arithmetic. If the device does natively support doubles, then this function
       does nothing.

       Returns:
           cudaSuccess

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

       See also:
           cudaLaunch (C API), cudaFuncSetCacheConfig (C API), cudaFuncGetAttributes (C API),
           cudaSetDoubleForHost, cudaSetupArgument (C API)

   cudaError_t cudaSetDoubleForHost (double * d)
       Deprecated
           This function is deprecated as of CUDA 7.5

       Converts the double value of d from a potentially internal float representation if the
       device does not support double arithmetic. If the device does natively support doubles,
       then this function does nothing.

       Parameters:
           d - Double to convert

       Returns:
           cudaSuccess

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

       See also:
           cudaLaunch (C API), cudaFuncSetCacheConfig (C API), cudaFuncGetAttributes (C API),
           cudaSetDoubleForDevice, cudaSetupArgument (C API)

Author

       Generated automatically by Doxygen from the source code.