3.10 Graphics Interoperability

The graphics interoperability (or "graphics interop") family of functions enables CUDA to read and write memory belonging to the OpenGL or DirectX3D APIs. If applications could attain acceptable performance by sharing data via host memory, there would be no need for these APIs. But with local memory bandwidth that can exceed 140G/s and PCI Express bandwidth that rarely exceeds 6G/s in practice, it is important to give applications the opportunity to keep data on the GPU when possible. Using the graphics interop APIs, CUDA kernels can write data into images and textures that are then incorporated into graphical output to be performed by OpenGL or DirectX3D.

Because the graphics and CUDA drivers must coordinate under the hood to enable interoperability, applications must signal their intention to perform graphics interop early. In particular, the CUDA context must be notified that it will be interoperating with a given API by calling special context creation APIs such as cuD3D10CtxCreate() or CUDASetDevice().

The coordination between drivers also motivated resource-sharing between graphics APIs and CUDA to occur in two steps.

Registration: a potentially expensive operation that signals the developer's intent to share the resources to the underlying drivers, possibly prompting them to move and/or lock down the resources in question
Mapping: a lightweight operation that is expected to occur at high frequency

In early versions of CUDA, the APIs for graphics interoperability with all four graphics APIs (OpenGL, DirectX 9, DirectX 10, and DirectX 11) were strictly separate. For example, for DirectX 9 interoperability, the following functions would be used in conjunction with one another.

cuD3D9RegisterResource()/cudaD3D9RegisterResource()
cuD3D9MapResources() /condaD3D9MapResources()
cuD3D9UnmapResources()/cudaD3D9UnmapResources()
cuD3D9UnregisterResource()/cudaD3D9UnregisterResource()

Because the underlying hardware capabilities are the same, regardless of the API used to access them, many of these functions were merged in CUDA 3.2. The registration functions remain API-specific, since they require API-specific bindings, but the functions to map, unmap, and unregister resources were made common. The CUDA 3.2 APIs corresponding to the above are as follows.

cuD3D9RegisterResource()/cudaD3D9RegisterResource()
cuGraphicsMapResources()/cudaGraphicsMapResources()
cuGraphicsUnmapResources()/cudaGraphicsUnmapResources()
cuGraphicsUnregisterResource() / CUDAGraphicsUnregisterResource()

The interoperability APIs for Direct3D 10 are the same, except the developer must use cuD3D10RegisterResource() / cudaD3D10RegisterResource() instead of the cuD3D9* variants.

CUDA 3.2 also added the ability to access textures from graphics APIs in the form of CUDA arrays. In DirectX3D, textures are just a different type of "resource" and may be referenced by IDirect3DResource9 * (or IDirect3DResource10 *, etc.). In OpenGL, a separate function cuGraphicsGLRegisterImage() is provided.

3.10_Graphics_Interoperability

3.10 Graphics Interoperability