Figure 10.5 Texturing with unnormalized coordinates (with linear filtering).

tex1D(16.0) = 15.0

Microdemo: tex1d_unnormalized.cu

The microdemo tex1d_unnormalized.cu is like a microscope to closely examine texturing behavior by printing the coordinate and the value returned by the tex1D() intrinsic together. Unlike the tex1dfetch_int2float.cu microdemo, this program uses a 1D CUDA array to hold the texture data. A certain number of texture fetches is performed, along a range of floating-point values specified by a base and increment; the interpolated values and the value returned by tex1D() are written together into an output array of float2. The CUDA kernel is as follows.

texture<float, 1> tex;
extern "C" __global__void
TexReadout(float2 *out, size_t N, float base, float increment)
{
    for (size_t i = blockIdx.x*blockDim.x + threadIdx.x; i < N; i += gridDim.x*blockDim.x)
        {
            float x = base + (float) i * increment;
            out[i].x = x;
            out[i].y = tex1D.tex, x);
        }
}

A host function CreateAndPrintTex(), given in Listing 10.4, takes the size of the texture to create, the number of texture fetches to perform, the base and increment of the floating-point range to pass to tex1D(), and optionally the filter and addressing modes to use on the texture. This function creates the CUDA array to hold the texture data, optionally initializes it with the caller-provided data (or identity elements if the caller passes NULL), binds the texture to the CUDA array, and prints the float2 output.

Listing 10.4 CreateAndPrintTex().

template<class T>  
void  
CreateAndPrintTex( T *initTex, size_t texN, size_t outN, float base, float increment,udaTextureFilterMode filterMode =udaFilterModePoint,udaTextureAddressMode addressMode =udaAddressModeClamp)  
{  
    T *texContents = 0;  
   udaArray *texArray = 0;  
    float2 *outHost = 0, *outDevice = 0;  
   udaError_t status;  
   udaChannelFormatDesc channelDesc =udaCreateChannelDesc<T>();

// use caller-provided array, if any, to initialize texture
if ( initTex ) {
    texContents = initTex;
}
else {
    // default is to initialize with identity elements
    texContents = (T *) malloc( texN* sizeof(T) );
    if (! texContents)
        goto Error;
    for ( int i = 0; i < texN; i++) {
        texContents[i] = (T) i;
    }
} CUDART_CHECK(cudaMemcpyArray(&texArray, &channelDesc, texN));
CUDART_CHECK(cudaHostAlloc((void **) &outHost,
                          outN*sizeof(float2),
                         =outHostAllocMapped));
CUDART_CHECK(cudaHostGetDevicePointer((void **)
                          &outDevice,
                          outHost, 0));
CUDART_CHECK(cudaMemcpyArray([texArray,
                          0, 0,
                          texContents,
                          texN*sizeof(T),
                         =outDevice, outN, base, increment));
CUDART_CHECK(cudaBindTextureArray([tex, texArray]);
tex.filterMode = filterMode;
tex-addressMode[0] = addressMode;
CUDART_CHECK(cudaHostGetDevicePointer(&outDevice, outHost, 0));
TexReadout<<2,384>>(outDevice, outN, base, increment);
CUDART_CHECK(cudaThreadSynchronize());
for ( int i = 0; i < outN; i++) {
    printf("%.2f,%.2f)\n", outHost[i].x, outHost[i].y);
}
printf("\n");
Error:
if (! initTex) free( texContents );
if ( texArray )udaFreeArray( texArray );
if ( outHost )udaFreeHost( outHost );
}

The main() function for this program is intended to be modified to study texturing behavior. This version creates an 8-element texture and writes the output of tex1D() from 0.0 .. 7.0.

int   
main(int argc, char \*argv[])   
{udaError_t status; CUDA_CHECK(cudaSetDeviceFlags(cudaDeviceMapHost)); CreateAndPrintTex(float>(NULL,8,8,0.0f,1.0f); CreateAndPrintTex(float>(NULL,8,8,0.0f,1.0f,udaFilterModeLinear); return 0;

The output from this program is as follows.

(0.00, 0.00) <- output from the first CreateAndPrintTex()  
(1.00, 1.00)  
(2.00, 2.00)  
(3.00, 3.00)  
(4.00, 4.00)  
(5.00, 5.00)  
(6.00, 6.00)  
(7.00, 7.00)  
(0.00, 0.00) <- output from the second CreateAndPrintTex()  
(1.00, 0.50)  
(2.00, 1.50)  
(3.00, 2.50)  
(4.00, 3.50)  
(5.00, 4.50)  
(6.00, 5.50)  
(7.00, 6.50)

If we change main() to invoke CreateAndPrintTex() as follows.

CreateAndPrintTex(NULL, 8, 20, 0.9f, 0.01f, CUDAFilterModePoint);

The resulting output highlights that when point filtering, 1.0 is the dividing line between texture elements 0 and 1.

(0.90, 0.00)  
(0.91, 0.00)  
(0.92, 0.00)  
(0.93, 0.00)  
(0.94, 0.00)  
(0.95, 0.00)  
(0.96, 0.00)  
(0.97, 0.00)  
(0.98, 0.00)  
(0.99, 0.00)  
(1.00, 1.00)  
< transition point  
(1.01, 1.00)  
(1.02, 1.00)  
(1.03, 1.00)

(1.04, 1.00)  
(1.05, 1.00)  
(1.06, 1.00)  
(1.07, 1.00)  
(1.08, 1.00)  
(1.09, 1.00)

One limitation of linear filtering is that it is performed with 9-bit weighting factors. It is important to realize that the precision of the interpolation depends not on that of the texture elements but on the weights. As an example, let's take a look at a 10-element texture initialized with normalized identity elements—that is, $\{0.0, 0.1, 0.2, 0.3, \ldots, 0.9\}$ instead of $\{0, 1, 2, \ldots, 9\}$ . CreateAndPrintTex() lets us specify the texture contents, so we can do so as follows.

float texData[10];   
for ( int i = 0; i < 10; i++) { texData[i] = (float) i / 10.0f; } CreateAndPrintTex(float> (texData, 10, 10, 0.0f, 1.0f);

The output from an unmodified CreateAndPrintTex() looks innocuous enough.

(0.00, 0.00)  
(1.00, 0.10)  
(2.00, 0.20)  
(3.00, 0.30)  
(4.00, 0.40)  
(5.00, 0.50)  
(6.00, 0.60)  
(7.00, 0.70)  
(8.00, 0.80)  
(9.00, 0.90)

Or if we invoke CreateAndPrintTex() with linear interpolation between the first two texture elements (values 0.1 and 0.2), we get the following.

CreateAndPrintTex(tex,10,10,1.5f,0.1f,cudaFilterModeLinear);

The resulting output is as follows.

(1.50, 0.10)  
(1.60, 0.11)  
(1.70, 0.12)  
(1.80, 0.13)  
(1.90, 0.14)  
(2.00, 0.15)  
(2.10, 0.16)  
(2.20, 0.17)  
(2.30, 0.18)  
(2.40, 0.19)

Rounded to 2 decimal places, this data looks very well behaved. But if we modify CreateAndPrintTex() to output hexadecimal instead, the output becomes

(1.50, 0x3dcccccd)  
(1.60, 0x3de1999a)  
(1.70, 0x3df5999a)  
(1.80, 0x3e053333)  
(1.90, 0x3e0f3333)  
(2.00, 0x3e19999a)  
(2.10, 0x3e240000)  
(2.20, 0x3e2e0000)  
(2.30, 0x3e386667)  
(2.40, 0x3e426667)

It is clear that most fractions of 10 are not exactly representable in floating point. Nevertheless, when performing interpolation that does not require high precision, these values are interpolated at full precision.

Microdemo: tex1d_9bit.cu

To explore this question of precision, we developed another microdemo, tex1d_9bit.cu. Here, we've populated a texture with 32-bit floating-point values that require full precision to represent. In addition to passing the base/ increment pair for the texture coordinates, another base/increment pair specifies the "expected" interpolation value, assuming full-precision interpolation.

In tex1d_9bit, the CreateAndPrintTex() function is modified to write its output as shown in Listing 10.5.

Listing 10.5 Tex1d_9bit.cu (excerpt).

printf("X\tY\tActual Value\tExpected Value\tDiff\n");

10.5_Texturing_with_Unnormalized_Coordinates

Microdemo: tex1d_9bit.cu