5.3_Constant_Memory

5.3 Constant Memory

Constant memory is optimized for read-only broadcast to multiple threads. As the name implies, the compiler uses constant memory to hold constants that couldn't be easily computed or otherwise compiled directly into the machine code. Constant memory resides in device memory but is accessed using different instructions that cause the GPU to access it using a special "constant cache."

The compiler for constants has 64K of memory available to use at its discretion. The developer has another 64K of memory available that can be declared with the constant keyword. These limits are per-module (for driver API applications) or per-file (for CUDA runtime applications).

Naïvely, one might expect constant memory to be analogous to the const keyword in C/C++, where it cannot be changed after initialization. But constant memory can be changed, either by memory copies or by querying the pointer to constant memory and writing to it with a kernel. CUDA kernels must not write to constant memory ranges that they may be accessing because the constant cache is not kept coherent with respect to the rest of the memory hierarchy during kernel execution.

5.3.1 HOST AND DEVICE _CONSTANT _ MEMORY

Mark Harris describes the following idiom that uses the predefined macro CUDA__ARCH to maintain host and device copies of constant memory that are conveniently accessed by both the CPU and GPU. 15^{15}

__constant__ double dc_vals[2] = { 0.0, 1000.0 };
const double hc_vals[2] = { 0.0, 1000.0 };  
device __host__ double f(size_t i)  
{
ifdef __CUDA崃_
return dc_vals[i];
else
return hc_vals[i];
endif

5.3.2 ACCESSING CONSTANT MEMORY

Besides the accesses to constant memory implicitly caused by C/C++ operators, developers can copy to and from constant memory, and even query the pointer to a constant memory allocation.

CUDA Runtime

CUDA runtime applications can copy to and from constant memory usingudaMemcpyToSymbol() andudaMemcpyFromSymbol(), respectively. The pointer to constant memory can be queried withudaGetSymbolAddress().

CUDAError_t CUDAGetSymbolAddress(void **devPtr, char *symbol);

This pointer may be used to write to constant memory with a kernel, though developers must take care not to write to the constant memory while another kernel is reading it.

Driver API

Driver API applications can query the device pointer of constant memory using cuModuleGetGlobal(). The driver API does not include a special memory copy function like CUDAMemcpyToSymbol(), since it does not have the language integration of the CUDA runtime. Applications must query the address with cuModuleGetGlobal() and then call cuMemcpyHtoD() or cuMemcpyDtoH().

The amount of constant memory used by a kernel may be queried with cuFuncGetAttribute (CU FUNC_ATTRIBUTE_constant_SIZEBytes).

5.3_Constant_Memory - The CUDA Handbook | OpenTech