Ubuntu Manpage: Occupancy -

Provided by: nvidia-cuda-dev_9.1.85-3ubuntu1_amd64

NAME

       Occupancy -

   Functions
       CUresult cuOccupancyMaxActiveBlocksPerMultiprocessor (int *numBlocks, CUfunction func, int blockSize,
           size_t dynamicSMemSize)
           Returns occupancy of a function.
       CUresult cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags (int *numBlocks, CUfunction func, int
           blockSize, size_t dynamicSMemSize, unsigned int flags)
           Returns occupancy of a function.
       CUresult cuOccupancyMaxPotentialBlockSize (int *minGridSize, int *blockSize, CUfunction func,
           CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int blockSizeLimit)
           Suggest a launch configuration with reasonable occupancy.
       CUresult cuOccupancyMaxPotentialBlockSizeWithFlags (int *minGridSize, int *blockSize, CUfunction func,
           CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int blockSizeLimit, unsigned
           int flags)
           Suggest a launch configuration with reasonable occupancy.

Detailed Description

       \brief occupancy calculation functions of the low-level CUDA driver API (cuda.h)

       This section describes the occupancy calculation functions of the low-level CUDA driver application
       programming interface.

Function Documentation

   CUresult cuOccupancyMaxActiveBlocksPerMultiprocessor (int * numBlocks, CUfunction func, int blockSize, size_t
       dynamicSMemSize)
       Returns in *numBlocks the number of the maximum active blocks per streaming multiprocessor.

       Parameters:
           numBlocks - Returned occupancy
           func - Kernel for which occupancy is calculated
           blockSize - Block size the kernel is intended to be launched with
           dynamicSMemSize - Per-block dynamic shared memory usage intended, in bytes

       Returns:
           CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT,
           CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

       See also:
           cudaOccupancyMaxActiveBlocksPerMultiprocessor

   CUresult cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags (int * numBlocks, CUfunction func, int
       blockSize, size_t dynamicSMemSize, unsigned int flags)
       Returns in *numBlocks the number of the maximum active blocks per streaming multiprocessor.

       The Flags parameter controls how special cases are handled. The valid flags are:

       • CU_OCCUPANCY_DEFAULT,        which        maintains        the        default        behavior        as
         cuOccupancyMaxActiveBlocksPerMultiprocessor;

       • CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE, which suppresses the default behavior on platform  where  global
         caching  affects  occupancy.  On such platforms, if caching is enabled, but per-block SM resource usage
         would result in zero occupancy, the occupancy calculator will calculate the occupancy as if caching  is
         disabled.  Setting  CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE makes the occupancy calculator to return 0 in
         such cases. More information can be found about this feature in the 'Unified L1/Texture Cache'  section
         of the Maxwell tuning guide.

       Parameters:
           numBlocks - Returned occupancy
           func - Kernel for which occupancy is calculated
           blockSize - Block size the kernel is intended to be launched with
           dynamicSMemSize - Per-block dynamic shared memory usage intended, in bytes
           flags - Requested behavior for the occupancy calculator

       Returns:
           CUDA_SUCCESS,   CUDA_ERROR_DEINITIALIZED,   CUDA_ERROR_NOT_INITIALIZED,   CUDA_ERROR_INVALID_CONTEXT,
           CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

       See also:
           cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFlags

   CUresult  cuOccupancyMaxPotentialBlockSize  (int  *  minGridSize,   int   *   blockSize,   CUfunction   func,
       CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int blockSizeLimit)
       Returns  in  *blockSize  a  reasonable block size that can achieve the maximum occupancy (or, the maximum
       number of active warps with the fewest blocks per multiprocessor), and in *minGridSize the  minimum  grid
       size to achieve the maximum occupancy.

       If  blockSizeLimit  is  0,  the  configurator  will  use the maximum block size permitted by the device /
       function instead.

       If  per-block  dynamic  shared  memory  allocation  is  not  needed,   the   user   should   leave   both
       blockSizeToDynamicSMemSize and dynamicSMemSize as 0.

       If  per-block  dynamic  shared  memory  allocation  is  needed, then if the dynamic shared memory size is
       constant  regardless  of  block  size,  the  size  should  be   passed   through   dynamicSMemSize,   and
       blockSizeToDynamicSMemSize should be NULL.

       Otherwise,  if the per-block dynamic shared memory size varies with different block sizes, the user needs
       to provide a unary function through blockSizeToDynamicSMemSize that computes the  dynamic  shared  memory
       needed by func for any given block size. dynamicSMemSize is ignored. An example signature is:

           // Take block size, returns dynamic shared memory needed
           size_t blockToSmem(int blockSize);

       Parameters:
           minGridSize - Returned minimum grid size needed to achieve the maximum occupancy
           blockSize - Returned maximum block size that can achieve the maximum occupancy
           func - Kernel for which launch configuration is calculated
           blockSizeToDynamicSMemSize - A function that calculates how much per-block dynamic shared memory func
           uses based on the block size
           dynamicSMemSize - Dynamic shared memory usage intended, in bytes
           blockSizeLimit - The maximum block size func is designed to handle

       Returns:
           CUDA_SUCCESS,   CUDA_ERROR_DEINITIALIZED,   CUDA_ERROR_NOT_INITIALIZED,   CUDA_ERROR_INVALID_CONTEXT,
           CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

       See also:
           cudaOccupancyMaxPotentialBlockSize

   CUresult cuOccupancyMaxPotentialBlockSizeWithFlags (int * minGridSize,  int  *  blockSize,  CUfunction  func,
       CUoccupancyB2DSize  blockSizeToDynamicSMemSize,  size_t dynamicSMemSize, int blockSizeLimit, unsigned int
       flags)
       An  extended  version  of  cuOccupancyMaxPotentialBlockSize.  In  addition   to   arguments   passed   to
       cuOccupancyMaxPotentialBlockSize, cuOccupancyMaxPotentialBlockSizeWithFlags also takes a Flags parameter.

       The Flags parameter controls how special cases are handled. The valid flags are:

       • CU_OCCUPANCY_DEFAULT, which maintains the default behavior as cuOccupancyMaxPotentialBlockSize;

       • CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE,  which  suppresses the default behavior on platform where global
         caching affects occupancy. On such platforms, the launch configurations that produces maximal occupancy
         might not support global caching. Setting CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE guarantees that the the
         produced launch configuration is global caching compatible at  a  potential  cost  of  occupancy.  More
         information  can  be  found about this feature in the 'Unified L1/Texture Cache' section of the Maxwell
         tuning guide.

       Parameters:
           minGridSize - Returned minimum grid size needed to achieve the maximum occupancy
           blockSize - Returned maximum block size that can achieve the maximum occupancy
           func - Kernel for which launch configuration is calculated
           blockSizeToDynamicSMemSize - A function that calculates how much per-block dynamic shared memory func
           uses based on the block size
           dynamicSMemSize - Dynamic shared memory usage intended, in bytes
           blockSizeLimit - The maximum block size func is designed to handle
           flags - Options

       Returns:
           CUDA_SUCCESS,   CUDA_ERROR_DEINITIALIZED,   CUDA_ERROR_NOT_INITIALIZED,   CUDA_ERROR_INVALID_CONTEXT,
           CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_UNKNOWN

       Note:
           Note that this function may also return error codes from previous, asynchronous launches.

       See also:
           cudaOccupancyMaxPotentialBlockSizeWithFlags

Author

       Generated automatically by Doxygen from the source code.

Version 6.0                                        3 Nov 2017                                       Occupancy(3)