Ubuntu Manpage: Execution Control -

Provided by: nvidia-cuda-dev_7.5.18-0ubuntu1_amd64

NAME

       Execution Control -

   Functions
       CUresult cuFuncGetAttribute (int *pi, CUfunction_attribute attrib, CUfunction hfunc)
           Returns information about a function.
       CUresult cuFuncSetCacheConfig (CUfunction hfunc, CUfunc_cache config)
           Sets the preferred cache configuration for a device function.
       CUresult cuFuncSetSharedMemConfig (CUfunction hfunc, CUsharedconfig config)
           Sets the shared memory configuration for a device function.
       CUresult cuLaunchKernel (CUfunction f, unsigned int gridDimX, unsigned int gridDimY, unsigned int
           gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int
           sharedMemBytes, CUstream hStream, void **kernelParams, void **extra)
           Launches a CUDA function.

Detailed Description

       \brief execution control functions of the low-level CUDA driver API (cuda.h)

       This section describes the execution control functions of the low-level CUDA driver application
       programming interface.

Function Documentation

   CUresult cuFuncGetAttribute (int * pi, CUfunction_attribute attrib, CUfunction hfunc)
       Returns in *pi the integer value of the attribute attrib on the kernel given by hfunc. The supported
       attributes are:

       • CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK: The maximum number of threads per block, beyond which a launch
         of  the  function  would  fail.  This  number  depends on both the function and the device on which the
         function is currently loaded.

       • CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES: The size in bytes of statically-allocated shared memory per  block
         required  by  this function. This does not include dynamically-allocated shared memory requested by the
         user at runtime.

       • CU_FUNC_ATTRIBUTE_CONST_SIZE_BYTES: The size in bytes of user-allocated  constant  memory  required  by
         this function.

       • CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES:  The  size  in  bytes  of  local memory used by each thread of this
         function.

       • CU_FUNC_ATTRIBUTE_NUM_REGS: The number of registers used by each thread of this function.

       • CU_FUNC_ATTRIBUTE_PTX_VERSION: The  PTX  virtual  architecture  version  for  which  the  function  was
         compiled.  This  value  is  the  major  PTX  version * 10 + the minor PTX version, so a PTX version 1.3
         function would return the value 13. Note that this may return the  undefined  value  of  0  for  cubins
         compiled prior to CUDA 3.0.

       • CU_FUNC_ATTRIBUTE_BINARY_VERSION:  The binary architecture version for which the function was compiled.
         This value is the major binary version * 10 + the  minor  binary  version,  so  a  binary  version  1.3
         function  would return the value 13. Note that this will return a value of 10 for legacy cubins that do
         not have a properly-encoded binary architecture version.

       • CU_FUNC_CACHE_MODE_CA: The attribute to indicate whether the  function  has  been  compiled  with  user
         specified option '-Xptxas --dlcm=ca' set .

       Parameters:
           pi - Returned attribute value
           attrib - Attribute requested
           hfunc - Function to query attribute of

       Returns:
           CUDA_SUCCESS,   CUDA_ERROR_DEINITIALIZED,   CUDA_ERROR_NOT_INITIALIZED,   CUDA_ERROR_INVALID_CONTEXT,
           CUDA_ERROR_INVALID_HANDLE, CUDA_ERROR_INVALID_VALUE

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

       See also:
           cuCtxGetCacheConfig, cuCtxSetCacheConfig, cuFuncSetCacheConfig, cuLaunchKernel

   CUresult cuFuncSetCacheConfig (CUfunction hfunc, CUfunc_cache config)
       On devices where the L1 cache and shared memory use the same hardware resources, this sets through config
       the preferred cache configuration for the device function hfunc. This is only a  preference.  The  driver
       will  use  the requested configuration if possible, but it is free to choose a different configuration if
       required to execute hfunc. Any context-wide preference set via cuCtxSetCacheConfig() will  be  overridden
       by  this per-function setting unless the per-function setting is CU_FUNC_CACHE_PREFER_NONE. In that case,
       the current context-wide setting will be used.

       This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.

       Launching a kernel with a different preference than the most  recent  preference  setting  may  insert  a
       device-side synchronization point.

       The supported cache configurations are:

       • CU_FUNC_CACHE_PREFER_NONE: no preference for shared memory or L1 (default)

       • CU_FUNC_CACHE_PREFER_SHARED: prefer larger shared memory and smaller L1 cache

       • CU_FUNC_CACHE_PREFER_L1: prefer larger L1 cache and smaller shared memory

       • CU_FUNC_CACHE_PREFER_EQUAL: prefer equal sized L1 cache and shared memory

       Parameters:
           hfunc - Kernel to configure cache for
           config - Requested cache configuration

       Returns:
           CUDA_SUCCESS,    CUDA_ERROR_INVALID_VALUE,    CUDA_ERROR_DEINITIALIZED,   CUDA_ERROR_NOT_INITIALIZED,
           CUDA_ERROR_INVALID_CONTEXT

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

       See also:
           cuCtxGetCacheConfig, cuCtxSetCacheConfig, cuFuncGetAttribute, cuLaunchKernel

   CUresult cuFuncSetSharedMemConfig (CUfunction hfunc, CUsharedconfig config)
       On devices with configurable shared memory banks, this function will force all subsequent launches of the
       specified device function to have the given shared memory bank size configuration. On any given launch of
       the function, the shared memory configuration of the device will be temporarily changed if needed to suit
       the function's preferred  configuration.  Changes  in  shared  memory  configuration  between  subsequent
       launches of functions, may introduce a device side synchronization point.

       Any  per-function  setting  of shared memory bank size set via cuFuncSetSharedMemConfig will override the
       context wide setting set with cuCtxSetSharedMemConfig.

       Changing the shared memory bank size will not  increase  shared  memory  usage  or  affect  occupancy  of
       kernels,  but  may  have major effects on performance. Larger bank sizes will allow for greater potential
       bandwidth to shared memory, but will change what kinds of accesses to shared memory will result  in  bank
       conflicts.

       This function will do nothing on devices with fixed shared memory bank size.

       The supported bank configurations are:

       • CU_SHARED_MEM_CONFIG_DEFAULT_BANK_SIZE:  use  the  context's shared memory configuration when launching
         this function.

       • CU_SHARED_MEM_CONFIG_FOUR_BYTE_BANK_SIZE: set shared memory bank width to be natively four  bytes  when
         launching this function.

       • CU_SHARED_MEM_CONFIG_EIGHT_BYTE_BANK_SIZE: set shared memory bank width to be natively eight bytes when
         launching this function.

       Parameters:
           hfunc - kernel to be given a shared memory config
           config - requested shared memory configuration

       Returns:
           CUDA_SUCCESS,    CUDA_ERROR_INVALID_VALUE,    CUDA_ERROR_DEINITIALIZED,   CUDA_ERROR_NOT_INITIALIZED,
           CUDA_ERROR_INVALID_CONTEXT

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

       See also:
           cuCtxGetCacheConfig,    cuCtxSetCacheConfig,    cuCtxGetSharedMemConfig,     cuCtxSetSharedMemConfig,
           cuFuncGetAttribute, cuLaunchKernel

   CUresult  cuLaunchKernel  (CUfunction f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ,
       unsigned int blockDimX, unsigned int blockDimY, unsigned  int  blockDimZ,  unsigned  int  sharedMemBytes,
       CUstream hStream, void ** kernelParams, void ** extra)
       Invokes  the kernel f on a gridDimX x gridDimY x gridDimZ grid of blocks. Each block contains blockDimX x
       blockDimY x blockDimZ threads.

       sharedMemBytes sets the amount of dynamic shared memory that will be available to each thread block.

       Kernel parameters to f can be specified in one of two ways:

       1) Kernel parameters can be specified via kernelParams. If f has N parameters, then kernelParams needs to
       be an array of N pointers. Each of kernelParams[0] through kernelParams[N-1] must point to  a  region  of
       memory  from  which the actual kernel parameter will be copied. The number of kernel parameters and their
       offsets and sizes do not need to be specified as that information is retrieved directly from the kernel's
       image.

       2) Kernel parameters can also be packaged by the application into a single buffer that is passed  in  via
       the  extra  parameter.  This places the burden on the application of knowing each kernel parameter's size
       and alignment/padding within the buffer. Here is an example of using the extra parameter in this manner:

           size_t argBufferSize;
           char argBuffer[256];

           // populate argBuffer and argBufferSize

           void *config[] = {
               CU_LAUNCH_PARAM_BUFFER_POINTER, argBuffer,
               CU_LAUNCH_PARAM_BUFFER_SIZE,    &argBufferSize,
               CU_LAUNCH_PARAM_END
           };
           status = cuLaunchKernel(f, gx, gy, gz, bx, by, bz, sh, s, NULL, config);

       The extra parameter exists to allow cuLaunchKernel to take additional less commonly used arguments. extra
       specifies a list of names of extra settings and their corresponding values. Each extra  setting  name  is
       immediately  followed  by  the  corresponding  value.  The  list  must  be terminated with either NULL or
       CU_LAUNCH_PARAM_END.

       • CU_LAUNCH_PARAM_END, which indicates the end of the extra array;

       • CU_LAUNCH_PARAM_BUFFER_POINTER, which specifies that the next value in extra will be  a  pointer  to  a
         buffer containing all the kernel parameters for launching kernel f;

       • CU_LAUNCH_PARAM_BUFFER_SIZE, which specifies that the next value in extra will be a pointer to a size_t
         containing the size of the buffer specified with CU_LAUNCH_PARAM_BUFFER_POINTER;

       The  error  CUDA_ERROR_INVALID_VALUE  will  be  returned  if  kernel  parameters  are specified with both
       kernelParams and extra (i.e. both kernelParams and extra are non-NULL).

       Calling cuLaunchKernel() sets persistent function state that is the same as function  state  set  through
       the   following   deprecated   APIs:   cuFuncSetBlockShape(),   cuFuncSetSharedSize(),  cuParamSetSize(),
       cuParamSeti(), cuParamSetf(), cuParamSetv().

       When the kernel f is launched via cuLaunchKernel(), the previous block shape, shared size  and  parameter
       info associated with f is overwritten.

       Note that to use cuLaunchKernel(), the kernel f must either have been compiled with toolchain version 3.2
       or later so that it will contain kernel parameter information, or have no kernel parameters. If either of
       these conditions is not met, then cuLaunchKernel() will return CUDA_ERROR_INVALID_IMAGE.

       Parameters:
           f - Kernel to launch
           gridDimX - Width of grid in blocks
           gridDimY - Height of grid in blocks
           gridDimZ - Depth of grid in blocks
           blockDimX - X dimension of each thread block
           blockDimY - Y dimension of each thread block
           blockDimZ - Z dimension of each thread block
           sharedMemBytes - Dynamic shared-memory size per thread block in bytes
           hStream - Stream identifier
           kernelParams - Array of pointers to kernel parameters
           extra - Extra options

       Returns:
           CUDA_SUCCESS,   CUDA_ERROR_DEINITIALIZED,   CUDA_ERROR_NOT_INITIALIZED,   CUDA_ERROR_INVALID_CONTEXT,
           CUDA_ERROR_INVALID_HANDLE,            CUDA_ERROR_INVALID_IMAGE,             CUDA_ERROR_INVALID_VALUE,
           CUDA_ERROR_LAUNCH_FAILED,        CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES,       CUDA_ERROR_LAUNCH_TIMEOUT,
           CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING, CUDA_ERROR_SHARED_OBJECT_INIT_FAILED

       Note:
           This function uses standard  semantics.

           Note that this function may also return error codes from previous, asynchronous launches.

       See also:
           cuCtxGetCacheConfig, cuCtxSetCacheConfig, cuFuncSetCacheConfig, cuFuncGetAttribute

Author

       Generated automatically by Doxygen from the source code.

Version 6.0                                        15 Aug 2015                              Execution Control(3)