Skip to content

Commit 7c936f9

Browse files
neon60jujiang-del
authored andcommitted
Add GPU programming patterns tutorials
Update projects/hip/docs/tutorial/programming-patterns/atomic_operations_histogram.rst WIP Co-authored-by: Julia Jiang <[email protected]> [rocm-systems] ROCm/rocm-systems#1918 (commit 2f6fb89)
1 parent 8f13f89 commit 7c936f9

File tree

9 files changed

+2461
-0
lines changed

9 files changed

+2461
-0
lines changed

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ The HIP documentation is organized into the following categories:
6363
* [HIP basic examples](https://github.com/ROCm/rocm-examples/tree/develop/HIP-Basic)
6464
* [HIP examples](https://github.com/ROCm/rocm-examples)
6565
* [SAXPY tutorial](./tutorial/saxpy)
66+
* [GPU programming patterns](./tutorial/programming-patterns)
6667
* [Reduction tutorial](./tutorial/reduction)
6768
* [Cooperative groups tutorial](./tutorial/cooperative_groups_tutorial)
6869
* [HIP Graph API tutorial](./tutorial/graph_api)

docs/sphinx/_toc.yml.in

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,14 @@ subtrees:
124124
- url: https://github.com/ROCm/rocm-examples
125125
title: HIP examples
126126
- file: tutorial/saxpy
127+
- file: tutorial/programming-patterns
128+
subtrees:
129+
- entries:
130+
- file: tutorial/programming-patterns/matrix_multiplication
131+
- file: tutorial/programming-patterns/atomic_operations_histogram
132+
- file: tutorial/programming-patterns/cpu_gpu_kmeans
133+
- file: tutorial/programming-patterns/stencil_operations
134+
- file: tutorial/programming-patterns/multikernel_bfs
127135
- file: tutorial/reduction
128136
- file: tutorial/cooperative_groups_tutorial
129137
- file: tutorial/graph_api

docs/tutorial/prerequisites.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Prerequisites
2+
=============
3+
4+
To follow this tutorial, you'll need installed drivers and a HIP compiler
5+
toolchain to compile your code. HIP supports compiling and running on Linux and
6+
Windows with AMD GPUs, the combination of install instructions is more than
7+
worth covering as part of this tutorial. For more information about installing
8+
HIP development packages, see :doc:`/install/install`.
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
.. meta::
2+
:description: GPU programming patterns and tutorials
3+
:keywords: AMD, ROCm, HIP, GPU, programming patterns, parallel computing, tutorial
4+
5+
.. _gpu_programming-patterns:
6+
7+
********************************************************************************
8+
GPU programming patterns
9+
********************************************************************************
10+
11+
GPU programming patterns are fundamental algorithmic structures that enable
12+
efficient parallel computation on GPUs. Understanding these
13+
patterns is essential for developers looking to effectively harness the massive parallel
14+
processing capabilities of modern GPUs for scientific computing, machine learning,
15+
image processing, and other computationally intensive applications.
16+
17+
These tutorials describe core programming patterns demonstrating how to
18+
efficiently implement common parallel algorithms using the HIP runtime API and
19+
kernel extensions. Each pattern addresses a specific computational challenge and
20+
provides practical implementations with detailed explanations.
21+
22+
Common GPU programming challenges
23+
==================================
24+
25+
GPU programming introduces unique challenges not present in traditional CPU
26+
programming:
27+
28+
* **Memory coherence**: GPUs lack robust cache coherence mechanisms, requiring
29+
careful coordination when multiple threads access shared memory.
30+
31+
* **Race conditions**: Concurrent memory access requires atomic operations or
32+
careful algorithm design.
33+
34+
* **Irregular parallelism**: Real-world algorithms often have varying amounts of
35+
parallel work across iterations.
36+
37+
* **CPU-GPU communication**: Data transfer overhead between host and device must
38+
be minimized.
39+
40+
Tutorial overview
41+
=================
42+
43+
This collection provides comprehensive tutorials on essential GPU programming
44+
patterns:
45+
46+
* :doc:`Two-dimensional kernels <./programming-patterns/matrix_multiplication>`:
47+
Processing grid-structured data such as matrices and images.
48+
49+
* :doc:`Stencil operations <./programming-patterns/stencil_operations>`:
50+
Updating array elements based on neighboring values.
51+
52+
* :doc:`Atomic operations <./programming-patterns/atomic_operations_histogram>`:
53+
Ensuring data integrity during concurrent memory access.
54+
55+
* :doc:`Multi-kernel applications <./programming-patterns/multikernel_bfs>`:
56+
Coordinating multiple GPU kernels to solve complex problems.
57+
58+
* :doc:`CPU-GPU cooperation <./programming-patterns/cpu_gpu_kmeans>`: Strategic
59+
work distribution between CPU and GPU.
60+
61+
Prerequisites
62+
-------------
63+
64+
To get the most from these tutorials, you should have:
65+
66+
* Basic understanding of C/C++ programming.
67+
68+
* Familiarity with parallel programming concepts.
69+
70+
* HIP runtime environment installed (see :doc:`../install/install`).
71+
72+
* Basic knowledge of GPU architecture (recommended).
73+
74+
Getting started
75+
---------------
76+
77+
Each tutorial is self-contained and can be studied independently, though we
78+
recommend following the order presented for a comprehensive understanding:
79+
80+
1. **Start with Two-dimensional kernels** to understand basic GPU thread
81+
organization and memory access patterns.
82+
2. **Progress to stencil operations** to learn about neighborhood dependencies.
83+
3. **Study atomic operations** to understand concurrent memory access.
84+
4. **Explore multi-kernel programming** for complex algorithmic patterns.
85+
5. **Check CPU-GPU cooperation** to handle mixed-parallelism workloads.

0 commit comments

Comments
 (0)