7.4_Occupancy
7.4 Occupancy
Occupancy is a ratio that measures the number of threads/SM that will run in a given kernel launch, as opposed to the maximum number of threads that potentially could be running on that SM.
Warps per SM Max.Warps per SM
The denominator (maximum warps per SM) is a constant that depends only on the compute capability of the device. The numerator of this expression, which determines the occupancy, is a function of the following.
Compute capability (1.0, 1.1, 1.2, 1.3, 2.0, 2.1, 3.0, 3.5)
Threads per block
Registers per thread
Shared memory configuration
Shared memory per block
To help developers assess the tradeoffs between these parameters, the CUDA Toolkit includes an occupancy calculator in the form of an Excel spreadsheet.10 Given the inputs above, the spreadsheet will calculate the following results.
Active thread count
Active warp count
Active block count
Occupancy (active warp count divided into the hardware's maximum number of active warps)
The spreadsheet also identifies whichever parameter is limiting the occupancy.
Registers per multiprocessor
Maximum number of warps or blocks per multiprocessorShared memory per multiprocessor
Note that occupancy is not the be-all and end-all of CUDA performance;[1] often it is better to use more registers per thread and rely on instruction-level parallelism (ILP) to deliver performance. NVIDIA has a good presentation on warps and occupancy that discusses the tradeoffs.[12]
An example of a low-occupancy kernel that can achieve near-maximum global memory bandwidth is given in Section 5.2.10 (Listing 5.5). The inner loop of the GlobalReads kernel can be unrolled according to a template parameter; as the number of unrolled iterations increases, the number of needed registers increases and the occupancy goes down. For the Tesla M2050's in the cg1.4xlarge instance type, for example, the peak read bandwidth reported (with ECC disabled) is 124GiB/s, with occupancy of . Volkov reports achieving
near-peak memory bandwidth when running kernels whose occupancy is in the single digits.