|
| 1 | +.. meta:: |
| 2 | + :description: GPU programming patterns and tutorials |
| 3 | + :keywords: AMD, ROCm, HIP, GPU, programming patterns, parallel computing, tutorial |
| 4 | + |
| 5 | +.. _gpu_programming-patterns: |
| 6 | + |
| 7 | +******************************************************************************** |
| 8 | +GPU programming patterns |
| 9 | +******************************************************************************** |
| 10 | + |
| 11 | +GPU programming patterns are fundamental algorithmic structures that enable |
| 12 | +efficient parallel computation on GPUs. Understanding these |
| 13 | +patterns is essential for developers looking to effectively harness the massive parallel |
| 14 | +processing capabilities of modern GPUs for scientific computing, machine learning, |
| 15 | +image processing, and other computationally intensive applications. |
| 16 | + |
| 17 | +These tutorials describe core programming patterns demonstrating how to |
| 18 | +efficiently implement common parallel algorithms using the HIP runtime API and |
| 19 | +kernel extensions. Each pattern addresses a specific computational challenge and |
| 20 | +provides practical implementations with detailed explanations. |
| 21 | + |
| 22 | +Common GPU programming challenges |
| 23 | +================================== |
| 24 | + |
| 25 | +GPU programming introduces unique challenges not present in traditional CPU |
| 26 | +programming: |
| 27 | + |
| 28 | +* **Memory coherence**: GPUs lack robust cache coherence mechanisms, requiring |
| 29 | + careful coordination when multiple threads access shared memory. |
| 30 | + |
| 31 | +* **Race conditions**: Concurrent memory access requires atomic operations or |
| 32 | + careful algorithm design. |
| 33 | + |
| 34 | +* **Irregular parallelism**: Real-world algorithms often have varying amounts of |
| 35 | + parallel work across iterations. |
| 36 | + |
| 37 | +* **CPU-GPU communication**: Data transfer overhead between host and device must |
| 38 | + be minimized. |
| 39 | + |
| 40 | +Tutorial overview |
| 41 | +================= |
| 42 | + |
| 43 | +This collection provides comprehensive tutorials on essential GPU programming |
| 44 | +patterns: |
| 45 | + |
| 46 | +* :doc:`Two-dimensional kernels <./programming-patterns/matrix_multiplication>`: |
| 47 | + Processing grid-structured data such as matrices and images. |
| 48 | + |
| 49 | +* :doc:`Stencil operations <./programming-patterns/stencil_operations>`: |
| 50 | + Updating array elements based on neighboring values. |
| 51 | + |
| 52 | +* :doc:`Atomic operations <./programming-patterns/atomic_operations_histogram>`: |
| 53 | + Ensuring data integrity during concurrent memory access. |
| 54 | + |
| 55 | +* :doc:`Multi-kernel applications <./programming-patterns/multikernel_bfs>`: |
| 56 | + Coordinating multiple GPU kernels to solve complex problems. |
| 57 | + |
| 58 | +* :doc:`CPU-GPU cooperation <./programming-patterns/cpu_gpu_kmeans>`: Strategic |
| 59 | + work distribution between CPU and GPU. |
| 60 | + |
| 61 | +Prerequisites |
| 62 | +------------- |
| 63 | + |
| 64 | +To get the most from these tutorials, you should have: |
| 65 | + |
| 66 | +* Basic understanding of C/C++ programming. |
| 67 | + |
| 68 | +* Familiarity with parallel programming concepts. |
| 69 | + |
| 70 | +* HIP runtime environment installed (see :doc:`../install/install`). |
| 71 | + |
| 72 | +* Basic knowledge of GPU architecture (recommended). |
| 73 | + |
| 74 | +Getting started |
| 75 | +--------------- |
| 76 | + |
| 77 | +Each tutorial is self-contained and can be studied independently, though we |
| 78 | +recommend following the order presented for a comprehensive understanding: |
| 79 | + |
| 80 | +1. **Start with Two-dimensional kernels** to understand basic GPU thread |
| 81 | + organization and memory access patterns. |
| 82 | +2. **Progress to stencil operations** to learn about neighborhood dependencies. |
| 83 | +3. **Study atomic operations** to understand concurrent memory access. |
| 84 | +4. **Explore multi-kernel programming** for complex algorithmic patterns. |
| 85 | +5. **Check CPU-GPU cooperation** to handle mixed-parallelism workloads. |
0 commit comments