10.10_3D_Texturing

10.10 3D Texturing

Reading from 3D textures is similar to reading from 2D textures, but there are more limitations.

  • 3D textures have smaller limits [2048x2048x2048 instead of 65536x32768].

  • There are no copy avoidance strategies: CUDA does not support 3D texturing from device memory or surface load/store on 3D CUDA arrays.

Other than that, the differences are straightforward: Kernels can read from 3D textures using a tex3D() intrinsic that takes 3 floating-point parameters, and the underlying 3D CUDA arrays must be populated by 3D memcpy. Trilinear filtering is supported; 8 texture elements are read and interpolated according to the texture coordinates, with the same 9-bit precision limit as 1D and 2D texturing.

The 3D texture size limits may be queried by calling cuDeviceGetAttribute() with CU_DEVICE_ATTRIBUTE_MAXIMUM-textURE3D_WIDTH, CU_DEVICE_ATTRIBUTE_MAXIMUM-textURE3D_HEIGHT, and CU_DEVICE_ATTRIBUTE_MAXIMUM-textURE3D_DEPTH, or by calling CUDAGetDeviceProperties() and examining CUDAProp.maxTexture3D. Due to the much larger number of parameters needed, 3D CUDA arrays must be created and manipulated using a different set of APIs than 1D or 2D CUDA arrays.

To create a 3D CUDA array, theudaMalloc3DArray() function takes audaExtent structure instead of width and height parameters.

CUDAError_t CUDAAlloc3DArray(struct CUDAArray** array, const struct CUDAChannelFormatDesc* desc, struct CUDAExtent extent, unsigned int flags __dv(0));

cudAExtent is defined as follows.

struct CUDAExtent {
    size_t width;
    size_t height;
    size_t depth;
};

Describing 3D memcpy operations is sufficiently complicated that both the CUDA runtime and the driver API use structures to specify the parameters. The runtime API uses theudaMemcpy3DParams structure, which is declared as follows.

structudaMemcpy3DParams {
    structudaArray *srcArray;
    structudaPos srcPos;
    structudaPitchedPtr srcPtr;
    structudaArray *dstArray;
    structudaPos dstPos;
    structudaPitchedPtr dstPtr;
    structudaExtent extent;
    enumudaMemcpyKind kind;
};

Most of these structure members are themselves structures: extent gives the width, height, and depth of the copy. The srcPos and dstPos members are

sudoPos structures that specify the start points for the source and destination of the copy.

struct CUDAPos {
    size_t x;
    size_t y;
    size_t z;
};

The CUDAPitchedPtr is a structure that was added with 3D memcpy to contain a pointer/pitch tuple.

struct CUDAPitchedPtr
{
    void *ptr; /*< Pointer to allocated memory */
    size_t pitch; /*< Pitch of allocated memory in bytes */
    size_t xsize; /*< Logical width of allocation in elements */
    size_t ysize; /*< Logical height of allocation in elements */
};

AudaPitchedPtr structure may be created with the function make_cudaPitchedPtr, which takes the base pointer, pitch, and logical width and height of the allocation. make_cudaPitchedPtr just copies its parameters into the output struct; however,

structudaPitchedPtr
make_cudaPitchedPtr(void *d, size_t p, size_t xsz, size_t ysz) {
    structudaPitchedPtr s;
    s.ptr = d;
    s.pitch = p;
    s.xsize = xsz;
    s.ysize = ysz;
    return s;
}

The simpleTexture3D sample in the SDK illustrates how to do 3D texturing with CUDA.

10.10_3D_Texturing - The CUDA Handbook | OpenTech