13.7_参考文献(并行扫描算法)

13.7 参考文献(并行扫描算法)

递归形式的先扫描再扇出算法,在NVIDIA的技术报告NVR-2008-003中由Sengupta等人提出。递归形式的先归约再扫描算法,由Dotsenko等人描述。两阶段先归约再扫描算法由Merrill给出。Merrill的论文非常值得阅读,其中不仅包含了背景,也有失败的尝试。例如,所尝试的基于Sklansky的最小深度电路法的扫描方法,表现令人失望。

Blelloch, Guy E. Prefix sums and their applications. Technical Report CMU-CS-90-190.
Dotsenko, Yuri, Naga K. Govindaraju, Peter-Pike Sloan, Charles Boyd, and John Manferdelli. Fast scan algorithms in graphics processors. In Proceedings of the 22nd Annual International Conference on Supercomputing, ACM, 2008, pp. 205-213.
Fellner, D., and S. Spender, eds. SIGGRAPH/Eurographics Conference on Graphics Hardware. Eurographics Association, Aire-la-Ville, Switzerland, pp. 97-106.
Harris, Mark, and Michael Garland. Optimizing parallel prefix operations for the Fermi architecture. In GPU Computing Gems, Jade Edition, Wen-Mei Hwu, ed. Morgan Kaufmann, Waltham, MA, 2012, pp. 29-38.
Harris, Mark, Shubhabrata Sengupta, and John Owens. Parallel prefix sum (scan) with CUDA. In GPU Gems 3, H. Nguyen, ed. Addison-Wesley, Boston, MA, Aug. 2007.
Merrill, Duane, and Andrew Grimshaw. Parallel scan for stream architectures. Technical Report CS2009-14. Department of Computer Science, University of Virginia.
Sengupta, Shubhabrata, Mark Harris, and Michael Garland. Efficient parallel scan algorithms for GPUs. NVIDIA Technical Report NVR-2008-003. December 2008.
http://research.nvidia.com/publication/efficient-parallel-scan-algorithms-gpus Sengupta, Shubhabrata, Mark Harris, ZhangYao Zhang, and John D. Owens. Scan primitives for GPU computing. In Proceedings of the 22nd ACM SIGGRAPH/Eurographics Symposium on Graphics Hardware. San Diego, CA, August 4-5, 2007.

13.7_参考文献(并行扫描算法) - CUDA专家手册 | OpenTech