15.6_Source_Code

15.6 Source Code

When working on optimized normalized cross-correlation code, it does not take long to realize that it's surprisingly difficult and error-prone. Converting the sums to correlation coefficients, as described in Section 15.1, must be done carefully due to the precision characteristics of float versus int (float has a greater dynamic range, but only 24 bits of precision). It is good practice to develop separate subroutines that report the computed sums to root cause whether a given implementation is reporting incorrect coefficients due to incorrect sums or an incorrect coefficient computation. Also, the sums can be bitwise-compared with CPU results, while the float-valued coefficients must be fuzzily compared against an epsilon value.

The different implementations of correlation are broken out into separate header (.cuh) files, and the kernels that emit sums as well as correlation coefficients are separate.

The normalizedCrossCorrelation.cu program tests both the functionality and the performance of the kernels. By default, it loads coins.pgm and detects the dime in the lower right corner. The dime is located at (210,148) and is 52×5252 \times 52 pixels in size. The program also writes the performance measurements to stdout—for example:

$ normalizedCrossCorrelation --padWidth 1024 --padHeight 1024
-wTemplate 16 -hTemplate 16
corrTexTex2D: 54.86 Mpix/s 14.05Gtpix/s
corrTemplate2D: 72.87 Mpix/s 18.65Gtpix/s
corrShared: 69.66 Mpix/s 17.83Gtpix/s
corrSharedSM: 78.66 Mpix/s 20.14Gtpix/s
corrShared4: 97.02 Mpix/s 24.84Gtpix/s

The program supports the following command line options.

--input : specify the input filename (default: coins.pgm).
--output : optionally specify the output filename. If specified, the program will write a PGM file containing an intensity map (like Figure 15.3) to this filename.
--padWidth : pad the width of the image.
--padHeight : pad the height of the image.
--xTemplate : specify the X coordinate of the upper left corner of the template.
--yTemplate : specify the Y coordinate of the upper left corner of the template.
--wTemplate : specify the width of the template.
--hTemplate : specify the height of the template.

15.6_Source_Code - The CUDA Handbook | OpenTech