13.7_References_Parallel_Scan_Algorithms

13.7 References (Parallel Scan Algorithms)

The recursive scan-then-fan is described in the NVIDIA Technical Report NVR-2008-003 by Sengupta et al. The recursive reduce-then-scan algorithm is described by Dotsenko et al. The two-level reduce-then-scan algorithm is due to Merrill. Merrill's paper is extremely valuable reading, both for background and for an overview of negative results—for example, an attempted formulation of Scan modeled on Sklansky's minimum-depth circuit whose performance was disappointing.

Blelloch, Guy E. Prefix sums and their applications. Technical Report CMU-CS-90-190.

Dotsenko, Yuri, Naga K. Govindaraju, Peter-Pike Sloan, Charles Boyd, and John Manferdelli. Fast scan algorithms in graphics processors. In Proceedings of the 22nd Annual International Conference on Supercomputing, ACM, 2008, pp. 205-213.

Fellner, D., and S. Spender, eds. SIGGRAPH/Eurographics Conference on Graphics Hardware. Eurographics Association, Aire-la-Ville, Switzerland, pp. 97-106.

Harris, Mark, and Michael Garland. Optimizing parallel prefix operations for the Fermi architecture. In GPU Computing Gems, Jade Edition, Wen-Mei Hwu, ed. Morgan Kaufmann, Waltham, MA, 2012, pp. 29-38.

Harris, Mark, Shubhabrata Sengupta, and John Owens. Parallel prefix sum (scan) with CUDA. In GPU Gems 3, H. Nguyen, ed. Addison-Wesley, Boston, MA, Aug. 2007.

Merrill, Duane, and Andrew Grimshaw. Parallel scan for stream architectures. Technical Report CS2009-14. Department of Computer Science, University of Virginia.

Sengupta, Shubhabrata, Mark Harris, and Michael Garland. Efficient parallel scan algorithms for GPUs. NVIDIA Technical Report NVR-2008-003. December 2008.

http://research.nvidia.com/publication/efficient-parallel-scan-algorithms-gpus

Sengupta, Shubhabrata, Mark Harris, ZhangYao Zhang, and John D. Owens. Scan primitives for GPU computing. In Proceedings of the 22nd ACM SIGGRAPH/Eurographics Symposium on Graphics Hardware. San Diego, CA, August 4-5, 2007.

13.7_References_Parallel_Scan_Algorithms - The CUDA Handbook | OpenTech