Ubuntu Manpage: Execution Control -

Provided by: nvidia-cuda-dev_7.5.18-0ubuntu1_amd64

NAME

       Execution Control -

   Functions
       __cudart_builtin__ cudaError_t cudaFuncGetAttributes (struct cudaFuncAttributes *attr, const void *func)
           Find out attributes for a given function.
       cudaError_t cudaFuncSetCacheConfig (const void *func, enum cudaFuncCache cacheConfig)
           Sets the preferred cache configuration for a device function.
       cudaError_t cudaFuncSetSharedMemConfig (const void *func, enum cudaSharedMemConfig config)
           Sets the shared memory configuration for a device function.
       __device__ __cudart_builtin__ void * cudaGetParameterBuffer (size_t alignment, size_t size)
           Obtains a parameter buffer.
       __device__ __cudart_builtin__ void * cudaGetParameterBufferV2 (void *func, dim3 gridDimension, dim3
           blockDimension, unsigned int sharedMemSize)
           Launches a specified kernel.
       cudaError_t cudaLaunchKernel (const void *func, dim3 gridDim, dim3 blockDim, void **args, size_t
           sharedMem, cudaStream_t stream)
           Launches a device function.
       cudaError_t cudaSetDoubleForDevice (double *d)
           Converts a double argument to be executed on a device.
       cudaError_t cudaSetDoubleForHost (double *d)
           Converts a double argument after execution on a device.

Detailed Description

       \brief execution control functions of the CUDA runtime API (cuda_runtime_api.h)

       This section describes the execution control functions of the CUDA runtime application programming
       interface.

       Some functions have overloaded C++ API template versions documented separately in the C++ API Routines
       module.

Function Documentation

   __cudart_builtin__ cudaError_t cudaFuncGetAttributes (struct cudaFuncAttributes * attr, const void * func)
       This function obtains the attributes of a function specified via func. func is a device function symbol
       and must be declared as a __global__ function. The fetched attributes are placed in attr. If the
       specified function does not exist, then cudaErrorInvalidDeviceFunction is returned. For templated
       functions, pass the function symbol as follows: func_name<template_arg_0,...,template_arg_N>

       Note that some function attributes such as maxThreadsPerBlock may vary based on the device that is
       currently being used.

       Parameters:
           attr - Return pointer to function's attributes
           func - Device function symbol

       Returns:
           cudaSuccess, cudaErrorInitializationError, cudaErrorInvalidDeviceFunction

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

           Use of a string naming a function as the func parameter was deprecated in CUDA 4.1 and removed in
           CUDA 5.0.

       See also:
           cudaConfigureCall, cudaFuncSetCacheConfig (C API), cudaFuncGetAttributes (C++ API), cudaLaunchKernel
           (C API), cudaSetDoubleForDevice, cudaSetDoubleForHost, cudaSetupArgument (C API)

   cudaError_t cudaFuncSetCacheConfig (const void * func, enum cudaFuncCache cacheConfig)
       On devices where the L1 cache and shared memory use the same hardware resources, this sets through
       cacheConfig the preferred cache configuration for the function specified via func. This is only a
       preference. The runtime will use the requested configuration if possible, but it is free to choose a
       different configuration if required to execute func.

       func is a device function symbol and must be declared as a __global__ function. If the specified function
       does not exist, then cudaErrorInvalidDeviceFunction is returned. For templated functions, pass the
       function symbol as follows: func_name<template_arg_0,...,template_arg_N>

       This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.

       Launching a kernel with a different preference than the most recent preference setting may insert a
       device-side synchronization point.

       The supported cache configurations are:

       • cudaFuncCachePreferNone: no preference for shared memory or L1 (default)

       • cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache

       • cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory

       • cudaFuncCachePreferEqual: prefer equal size L1 cache and shared memory

       Parameters:
           func - Device function symbol
           cacheConfig - Requested cache configuration

       Returns:
           cudaSuccess, cudaErrorInitializationError, cudaErrorInvalidDeviceFunction

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

           Use  of  a  string  naming a function as the func parameter was deprecated in CUDA 4.1 and removed in
           CUDA 5.0.

       See also:
           cudaConfigureCall, cudaFuncSetCacheConfig (C++ API), cudaFuncGetAttributes (C API),  cudaLaunchKernel
           (C     API),     cudaSetDoubleForDevice,    cudaSetDoubleForHost,    cudaSetupArgument    (C    API),
           cudaThreadGetCacheConfig, cudaThreadSetCacheConfig

   cudaError_t cudaFuncSetSharedMemConfig (const void * func, enum cudaSharedMemConfig config)
       On devices with configurable shared memory banks, this function will force all subsequent launches of the
       specified device function to have the given shared memory bank size configuration. On any given launch of
       the function, the shared memory configuration of the device will be temporarily changed if needed to suit
       the function's preferred  configuration.  Changes  in  shared  memory  configuration  between  subsequent
       launches of functions, may introduce a device side synchronization point.

       Any  per-function setting of shared memory bank size set via cudaFuncSetSharedMemConfig will override the
       device wide setting set by cudaDeviceSetSharedMemConfig.

       Changing the shared memory bank size will not  increase  shared  memory  usage  or  affect  occupancy  of
       kernels,  but  may  have major effects on performance. Larger bank sizes will allow for greater potential
       bandwidth to shared memory, but will change what kinds of accesses to shared memory will result  in  bank
       conflicts.

       This function will do nothing on devices with fixed shared memory bank size.

       For       templated       functions,       pass       the      function      symbol      as      follows:
       func_name<template_arg_0,...,template_arg_N>

       The supported bank configurations are:

       • cudaSharedMemBankSizeDefault:  use  the  device's  shared  memory  configuration  when  launching  this
         function.

       • cudaSharedMemBankSizeFourByte:  set  shared  memory bank width to be four bytes natively when launching
         this function.

       • cudaSharedMemBankSizeEightByte: set shared memory bank width to be eight bytes natively when  launching
         this function.

       Parameters:
           func - Device function symbol
           config - Requested shared memory configuration

       Returns:
           cudaSuccess, cudaErrorInitializationError, cudaErrorInvalidDeviceFunction, cudaErrorInvalidValue,

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

           Use  of  a  string  naming a function as the func parameter was deprecated in CUDA 4.1 and removed in
           CUDA 5.0.

       See also:
           cudaConfigureCall,            cudaDeviceSetSharedMemConfig,             cudaDeviceGetSharedMemConfig,
           cudaDeviceSetCacheConfig, cudaDeviceGetCacheConfig, cudaFuncSetCacheConfig

   __device__ __cudart_builtin__ void* cudaGetParameterBuffer (size_t alignment, size_t size)
       Obtains  a parameter buffer which can be filled with parameters for a kernel launch. Parameters passed to
       cudaLaunchDevice must be allocated via this function.

       This is a low level API and can only be accessed from Parallel Thread Execution  (PTX).  CUDA  user  code
       should use <<< >>> to launch kernels.

       Parameters:
           alignment - Specifies alignment requirement of the parameter buffer
           size - Specifies size requirement in bytes

       Returns:
           Returns pointer to the allocated parameterBuffer

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

       See also:
           cudaLaunchDevice

   __device__   __cudart_builtin__  void*  cudaGetParameterBufferV2  (void  *  func,  dim3  gridDimension,  dim3
       blockDimension, unsigned int sharedMemSize)
       Launches a specified kernel with the specified parameter buffer. A parameter buffer can  be  obtained  by
       calling cudaGetParameterBuffer().

       This  is  a  low  level API and can only be accessed from Parallel Thread Execution (PTX). CUDA user code
       should use <<< >>> to launch the kernels.

       Parameters:
           func - Pointer to the kernel to be launched
           parameterBuffer - Holds  the  parameters  to  the  launched  kernel.  parameterBuffer  can  be  NULL.
           (Optional)
           gridDimension - Specifies grid dimensions
           blockDimension - Specifies block dimensions
           sharedMemSize - Specifies size of shared memory
           stream - Specifies the stream to be used

       Returns:
           cudaSuccess,  cudaErrorInvalidDevice, cudaErrorLaunchMaxDepthExceeded, cudaErrorInvalidConfiguration,
           cudaErrorStartupFailure, cudaErrorLaunchPendingCountExceeded, cudaErrorLaunchOutOfResources

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.
            Please refer to Execution Configuration and Parameter Buffer Layout from the CUDA Programming  Guide
           for the detailed descriptions of launch configuration and parameter layout respectively.

       See also:
           cudaGetParameterBuffer

   cudaError_t cudaLaunchKernel (const void * func, dim3 gridDim, dim3 blockDim, void ** args, size_t sharedMem,
       cudaStream_t stream)
       The  function  invokes  kernel  func  on gridDim (gridDim.x × gridDim.y × gridDim.z) grid of blocks. Each
       block contains blockDim (blockDim.x × blockDim.y × blockDim.z) threads.

       If the kernel has N parameters the args should point to array of N pointers. Each pointer,  from  args[0]
       to args[N - 1], point to the region of memory from which the actual parameter will be copied.

       For       templated       functions,       pass       the      function      symbol      as      follows:
       func_name<template_arg_0,...,template_arg_N>

       sharedMem sets the amount of dynamic shared memory that will be available to each thread block.

       stream specifies a stream the invocation is associated to.

       Parameters:
           func - Device function symbol
           gridDim - Grid dimentions
           blockDim - Block dimentions
           args - Arguments
           sharedMem - Shared memory
           stream - Stream identifier

       Returns:
           cudaSuccess, cudaErrorInvalidDeviceFunction,  cudaErrorInvalidConfiguration,  cudaErrorLaunchFailure,
           cudaErrorLaunchTimeout, cudaErrorLaunchOutOfResources, cudaErrorSharedObjectInitFailed

       Note:
           This function uses standard  semantics.

           Note that this function may also return error codes from previous, asynchronous launches.

       cudaLaunchKernel (C++ API)

   cudaError_t cudaSetDoubleForDevice (double * d)
       Parameters:
           d - Double to convert

       Deprecated
           This function is deprecated as of CUDA 7.5

       Converts  the double value of d to an internal float representation if the device does not support double
       arithmetic. If the device does natively support doubles, then this function does nothing.

       Returns:
           cudaSuccess

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

       cudaLaunch (C API), cudaFuncSetCacheConfig (C API), cudaFuncGetAttributes (C API),  cudaSetDoubleForHost,
       cudaSetupArgument (C API)

   cudaError_t cudaSetDoubleForHost (double * d)
       Deprecated
           This function is deprecated as of CUDA 7.5

       Converts  the  double  value of d from a potentially internal float representation if the device does not
       support double arithmetic. If the device does natively support doubles, then this function does nothing.

       Parameters:
           d - Double to convert

       Returns:
           cudaSuccess

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

       cudaLaunch   (C    API),    cudaFuncSetCacheConfig    (C    API),    cudaFuncGetAttributes    (C    API),
       cudaSetDoubleForDevice, cudaSetupArgument (C API)

Author

       Generated automatically by Doxygen from the source code.

Version 6.0                                        15 Aug 2015                              Execution Control(3)