13.1_Definition_and_Variations
13.1 Definition and Variations
Inclusive scan takes a binary associative operator and an array of length
and returns the array
Each element of the output depends on the preceding elements in the input.
Exclusive scan is defined similarly but shifts the output and uses an identity element that has no effect on a value when is performed with it (for example, 0 for integer addition, 1 for multiplication, etc.).
Inclusive and exclusive scans can be transformed between each other by adding or subtracting the input array element by element, as shown in Figure 13.1.
Stream compaction is an operation that separates elements in an array according to a criterion. If a predicate (0 or 1) is computed for each element of the input array to determine whether it should be included in the output stream, then an exclusive scan on the predicates computes the indices of the output elements. A variation of stream compaction, known as stream splitting, writes the compact output separately for each value of the predicate. Segmented scan is a variation that takes a set of input flags (one per array element) in addition to the array and performs scans on the subarrays delineated by the flags.
Due to the importance of the Scan primitive, an enormous amount of effort has been put into developing optimized scan implementations for CUDA. A list of references is given at the end of this chapter. Both the CUDPP and Thrust libraries include families of optimized Scan primitives that use templates for the best tradeoff between generality and performance. All that said, however, applications that use Scan as a primitive usually can benefit from custom implementations that take advantage of specific knowledge about the problem.

Figure 13.1 Inclusive and exclusive scan.