4.1 nvcc-CUDA Compiler Driver

nvcc is the compiler driver CUDA developers use to translate source code into functional CUDA applications. It can perform many functions, from as simple as a targeted compilation of a GPU-only . cu file to as complex as compiling, linking, and executing a sample program in one command (a usage encouraged by many of the sample programs in this book).

As a compiler driver, nvcc does nothing more than set up a build environment and spawn a combination of native tools (such as the C compiler installed on the

system) and CUDA-specific command-line tools (such as ptexas) to build the CUDA code. It implements many sensible default behaviors that can be overridden by command-line options; its exact behavior depends on which "compile trajectory" is requested by the main command-line option.

Table 4.1 lists the file extensions understood by nvcc and the default behavior implemented for them. (Note: Some intermediate file types, like the .i/.ii files that contain host code generated by CUDA's front end, are omitted here.) Table 4.2 lists the compilation stage options and corresponding compile trajectory. Table 4.3 lists nvcc options that affect the environment, such as paths to include directories. Table 4.4 lists nvcc options that affect the output, such as whether to include debugging information. Table 4.5 lists "passthrough" options that enable nvcc to pass options to the tools that it invokes, such as ptxas. Table 4.6 lists nvcc options that aren't easily categorized, such as the -keep option that instructs nvcc not to delete the temporary files it created.

Table 4.1 Extensions for nvcc Input Files

Table 4.2 Compilation Trajectories

Table 4.2 Compilation Trajectories (Continued)

These command-line options discard any host code in the input file.

Table 4.3 nvcc Options (Environment)

continues

Table 4.3 rvcc Options (Environment) (Continued)

Table 4.4 Options for Specifying Behavior of Compiler/Linker

Table 4.4 Options for Specifying Behavior of Compiler/Linker (Continued)

Table 4.5 nvcc Options for Passthrough

Table 4.6 Miscellaneous nvcc Options

Table 4.6 Miscellaneous nvcc Options (Continued)

Table 4.7 lists nvcc options related to code generation. The --gpu-architecture and --gpu-code options are especially confusing. The former controls which virtual GPU architecture to compile for (i.e., which version of PTX to emit), while the latter controls which actual GPU architecture to compile for (i.e., which version of SM microcode to emit). The --gpu-code option must specify SM versions that are at least as high as the versions specified to --gpu-architecture.

Table 4.7 nvcc Options for Code Generation

continues

Table 4.7 rvcc Options for Code Generation (Continued)

The --export-dir option specifies a directory where all device code images will be copied. It is intended as a device code repository that can be inspected by the CUDA driver when the application is running (in which case the directory should be in the CUDA_DEVCODE_PATH environment variable). The repository can be either a directory or a ZIP file. In either case, CUDA will maintain a directory structure to facilitate code lookup by the CUDA driver. If a filename is specified but does not exist, a directory structure (not a ZIP file) will be created at that location.

4.2 pxas—the PTX Assembler

ptexas, the tool that compiles PTX into GPU-specific microcode, occupies a unique place in the CUDA ecosystem in that NVIDIA makes it available both in the offline tools (which developers compile into applications) and as part of the driver, enabling so-called "online" or "just-in-time" (JIT) compilation (which occurs at runtime).

When compiling offline, ptxas generally is invoked by nvcc if any actual GPU architectures are specified with the --gpu-code command-line option. In that case, command-line options (summarized in Table 4.8) can be passed to ptxas via the -Xptxas command-line option to nvcc.

Table 4.8 Command-Line Options for ptexas

continues

Table 4.8 Command-Line Options for ptxas (Continued)

Developers also can load PTX code dynamically by invoking cuModuleLoadDataEx(), as follows.

CUresult cuModuleLoadDataEx ( CUmodule *module, const void *image, unsigned int numOptions, CUjit_option *options, void **optionValues);

cuModuleLoadDataEx() takes a pointer image and loads the corresponding module into the current context. The pointer may be obtained by mapping a cubin or PTX or fatbin file, passing a cubin or PTX or fatbin file as a NULL-terminated text string, or incorporating a cubin or fatbin object into the executable resources and using operating system calls such as Windows FindResource() to obtain the pointer. Options are passed as an array via options, and any corresponding parameters are passed in optionValues. The number of total options is specified by numOptions. Any outputs will be returned via optionValues. Supported options are given in Table 4.9.

Table 4.9 Options for cuModuleLoadDataEx()

Table 4.9 Options for cuModuleLoadDataEx (Continued)

4.1_nvcc-CUDA_Compiler_Driver

4.1 nvcc-CUDA Compiler Driver

4.2 pxas—the PTX Assembler