3.6_Device_Memory

3.6 Device Memory

Device memory (or linear device memory) resides in the CUDA address space and may be accessed by CUDA kernels via normal C/C++ pointer and array dereferencing operations. Most GPUs have a dedicated pool of device memory that is directly attached to the GPU and accessed by an integrated memory controller.

CUDA hardware does not support demand paging, so all memory allocations are backed by actual physical memory. Unlike CPU applications, which can allocate more virtual memory than there is physical memory in the system, CUDA's memory allocation facilities fail when the physical memory is exhausted. The details of how to allocate, free, and access device memory are given in Section 5.2.

CUDA Runtime

CUDA runtime applications may query the total amount of device memory available on a given device by calling CUDAGetDeviceProperties() and examining CUDADeviceProp::totalGlobalMem.cudaMalloc() and CUDAFree() allocate and free device memory, respectively. CUDAAllocPitch() allocates pitched memory; CUDAFree() may be used to free it. CUDAAlloc3D() performs a 3D allocation of pitched memory.

Driver API

Driver API applications may query the total amount of device memory available on a given device by calling cuDeviceTotalMem(). Alternatively, the

cuMemGetInfo() function may be used to query the amount of free device memory as well as the total. cuMemGetInfo() can only be called when a CUDA context is current to the CPU thread. cuMemAlloc() and cuMemFree() allocate and free device memory, respectively. cuMemAllocPitch() allocates pitched memory; cuMemFree() may be used to free it.