3.4 Modules and Functions

Modules are collections of code and related data that are loaded together, analogous to DLLs on Windows or DSOs on Linux. Like CUDA contexts, the CUDA runtime does not explicitly support modules; they are available only in the CUDA driver API.7

CUDA does not have an intermediate structure analogous to object files that can be synthesized into a CUDA module. Instead, nvcc directly emits files that can be loaded as CUDA modules.

. cubin files that target specific SM versions

.ptx files that can be compiled onto the hardware by the driver

This data needn't be sent to end users in the form of these files. CUDA includes APIs to load modules as NULL-terminated strings that can be embedded in executable resources or elsewhere.8

Once a CUDA module is loaded, the application can query for the resources contained in it.

Globals
Functions (kernels)
Texture references

One important note: All of these resources are created when the module is loaded, so the query functions cannot fail due to a lack of resources.

Like contexts, the CUDA runtime hides the existence and management of modules. All modules are loaded at the same time CUDA is initialized. For applications with large amounts of GPU code, the ability to explicitly manage residency by loading and unloading modules is one of the principal reasons to use the driver API instead of the CUDA runtime.

Modules are built by invoking nvcc, which can emit different types of modules, depending on the command line parameters, as summarized in Table 3.3. Since cubins have been compiled to a specific GPU architecture, they do not have to be compiled "just in time" and are faster to load. But they are neither backward compatible (e.g., cubins compiled onto SM 2.x cannot run on SM 1.x architectures) nor forward compatible (e.g., cubins compiled onto SM 2.x architectures will not run on SM 3.x architectures). As a result, only applications with a priori knowledge of their target GPU architectures (and thus cubin versions) can use cubins without also embedding PTX versions of the same modules to use as backup.

PTX is the intermediate language used as a source for the driver's just-in-time compilation. Because this compilation can take a significant amount of time, the driver saves compiled modules and reuses them for a given PTX module, provided the hardware and driver have not changed. If the driver or hardware changes, all PTX modules must be recompiled.

With fatbins, the CUDA runtime automates the process of using a suitable cubin, if available, and compiling PTX otherwise. The different versions are embedded as strings in the host C++ code emitted by nvcc. Applications using the driver

Table 3.3 nvcc Module Types

Table 3.4 Module Query Functions

API have the advantage of finer-grained control over modules. For example, they can be embedded as resources in the executable, encrypted, or generated at runtime, but the process of using cubins if available and compiling PTX otherwise must be implemented explicitly.

Once a module is loaded, the application can query for the resources contained in it: globals, functions (kernels), and texture references. One important note: All of these resources are created when the module is loaded, so the query functions (summarized in Table 3.4) cannot fail due to a lack of resources.

3.4_Modules_and_Functions

3.4 Modules and Functions