Ubuntu Manpage: Occupancy -

Provided by: nvidia-cuda-dev_7.5.18-0ubuntu1_amd64

NAME

       Occupancy -

   Functions
       CUresult cuOccupancyMaxActiveBlocksPerMultiprocessor (int *numBlocks, CUfunction func, int
           blockSize, size_t dynamicSMemSize)
           Returns occupancy of a function.
       CUresult cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags (int *numBlocks, CUfunction
           func, int blockSize, size_t dynamicSMemSize, unsigned int flags)
           Returns occupancy of a function.
       CUresult cuOccupancyMaxPotentialBlockSize (int *minGridSize, int *blockSize, CUfunction
           func, CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int
           blockSizeLimit)
           Suggest a launch configuration with reasonable occupancy.
       CUresult cuOccupancyMaxPotentialBlockSizeWithFlags (int *minGridSize, int *blockSize,
           CUfunction func, CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t
           dynamicSMemSize, int blockSizeLimit, unsigned int flags)
           Suggest a launch configuration with reasonable occupancy.

Detailed Description

       \brief occupancy calculation functions of the low-level CUDA driver API (cuda.h)

       This section describes the occupancy calculation functions of the low-level CUDA driver
       application programming interface.

Function Documentation

   CUresult cuOccupancyMaxActiveBlocksPerMultiprocessor (int * numBlocks, CUfunction func, int
       blockSize, size_t dynamicSMemSize)
       Returns in *numBlocks the number of the maximum active blocks per streaming
       multiprocessor.

       Parameters:
           numBlocks - Returned occupancy
           func - Kernel for which occupancy is calculated
           blockSize - Block size the kernel is intended to be launched with
           dynamicSMemSize - Per-block dynamic shared memory usage intended, in bytes

       Returns:
           CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED,
           CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

   CUresult cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags (int * numBlocks, CUfunction
       func, int blockSize, size_t dynamicSMemSize, unsigned int flags)
       Returns in *numBlocks the number of the maximum active blocks per streaming
       multiprocessor.

       The Flags parameter controls how special cases are handled. The valid flags are:

       • CU_OCCUPANCY_DEFAULT, which maintains the default behavior as
         cuOccupancyMaxActiveBlocksPerMultiprocessor;

       • CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE, which suppresses the default behavior on platform
         where global caching affects occupancy. On such platforms, if caching is enabled, but
         per-block SM resource usage would result in zero occupancy, the occupancy calculator
         will calculate the occupancy as if caching is disabled. Setting
         CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE makes the occupancy calculator to return 0 in such
         cases. More information can be found about this feature in the 'Unified L1/Texture
         Cache' section of the Maxwell tuning guide.

       Parameters:
           numBlocks - Returned occupancy
           func - Kernel for which occupancy is calculated
           blockSize - Block size the kernel is intended to be launched with
           dynamicSMemSize - Per-block dynamic shared memory usage intended, in bytes
           flags - Requested behavior for the occupancy calculator

       Returns:
           CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED,
           CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

   CUresult cuOccupancyMaxPotentialBlockSize (int * minGridSize, int * blockSize, CUfunction
       func, CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int
       blockSizeLimit)
       Returns in *blockSize a reasonable block size that can achieve the maximum occupancy (or,
       the maximum number of active warps with the fewest blocks per multiprocessor), and in
       *minGridSize the minimum grid size to achieve the maximum occupancy.

       If blockSizeLimit is 0, the configurator will use the maximum block size permitted by the
       device / function instead.

       If per-block dynamic shared memory allocation is not needed, the user should leave both
       blockSizeToDynamicSMemSize and dynamicSMemSize as 0.

       If per-block dynamic shared memory allocation is needed, then if the dynamic shared memory
       size is constant regardless of block size, the size should be passed through
       dynamicSMemSize, and blockSizeToDynamicSMemSize should be NULL.

       Otherwise, if the per-block dynamic shared memory size varies with different block sizes,
       the user needs to provide a unary function through blockSizeToDynamicSMemSize that
       computes the dynamic shared memory needed by func for any given block size.
       dynamicSMemSize is ignored. An example signature is:

           // Take block size, returns dynamic shared memory needed
           size_t blockToSmem(int blockSize);

       Parameters:
           minGridSize - Returned minimum grid size needed to achieve the maximum occupancy
           blockSize - Returned maximum block size that can achieve the maximum occupancy
           func - Kernel for which launch configuration is calculated
           blockSizeToDynamicSMemSize - A function that calculates how much per-block dynamic
           shared memory func uses based on the block size
           dynamicSMemSize - Dynamic shared memory usage intended, in bytes
           blockSizeLimit - The maximum block size func is designed to handle

       Returns:
           CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED,
           CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

   CUresult cuOccupancyMaxPotentialBlockSizeWithFlags (int * minGridSize, int * blockSize,
       CUfunction func, CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize,
       int blockSizeLimit, unsigned int flags)
       An extended version of cuOccupancyMaxPotentialBlockSize. In addition to arguments passed
       to cuOccupancyMaxPotentialBlockSize, cuOccupancyMaxPotentialBlockSizeWithFlags also takes
       a Flags parameter.

       The Flags parameter controls how special cases are handled. The valid flags are:

       • CU_OCCUPANCY_DEFAULT, which maintains the default behavior as
         cuOccupancyMaxPotentialBlockSize;

       • CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE, which suppresses the default behavior on platform
         where global caching affects occupancy. On such platforms, the launch configurations
         that produces maximal occupancy might not support global caching. Setting
         CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE guarantees that the the produced launch
         configuration is global caching compatible at a potential cost of occupancy. More
         information can be found about this feature in the 'Unified L1/Texture Cache' section of
         the Maxwell tuning guide.

       Parameters:
           minGridSize - Returned minimum grid size needed to achieve the maximum occupancy
           blockSize - Returned maximum block size that can achieve the maximum occupancy
           func - Kernel for which launch configuration is calculated
           blockSizeToDynamicSMemSize - A function that calculates how much per-block dynamic
           shared memory func uses based on the block size
           dynamicSMemSize - Dynamic shared memory usage intended, in bytes
           blockSizeLimit - The maximum block size func is designed to handle
           flags - Options

       Returns:
           CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED,
           CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

       Note:
           Note that this function may also return error codes from previous, asynchronous
           launches.

Author

       Generated automatically by Doxygen from the source code.