6.3_CUDA_Events_CPU_GPU_Synchronization

6.3 CUDA Events: CPU/GPU Synchronization

One of the key features of CUDA events is that they can enable "partial" CPU/GPU synchronization. Instead of full CPU/GPU synchronization where the CPU waits until the GPU is idle, introducing a bubble into the GPU's work pipeline, CUDA events may be recorded into the asynchronous stream of GPU commands. The CPU then can wait until all of the work preceding the event has been done. The GPU can continue doing whatever work was submitted after the cuEventRecord() /CUDAEventRecord().

As an example of CPU/GPU concurrency, Listing 6.2 gives a memcpy routine for pageable memory. The code for this program implements the algorithm described in Figure 6.3 and is located in pageableMemcpyHtoD.cu. It uses two pinned memory buffers, stored in global variables declared as follows.

void \*g_hostBuffers[2];

and two CUDA events declared as

cudaEvent_t g_events[2];

Listing 6.2 chMemcpyHtoD() --pageable memcpy.

void  
chMemcpyHtoD(void *device, const void *host, size_t N)  
{  
   udaError_t status;  
    char *dst = (char *) device;  
    const char *src = (const char *) host;