AMD GPU Programming Training Course

ROCm is an open-source platform designed for GPU programming that supports AMD GPUs while also offering compatibility with CUDA and OpenCL. ROCm provides developers with direct access to hardware details, granting complete control over the parallelization process. However, this approach demands a solid understanding of device architecture, memory models, execution models, and optimization techniques.

HIP serves as a C++ runtime API and kernel language, enabling the creation of portable code that can operate on both AMD and NVIDIA GPUs. By providing a lightweight abstraction layer over native GPU APIs like ROCm and CUDA, HIP allows developers to utilize existing GPU libraries and tools effectively.

This instructor-led, live training (available online or onsite) is designed for beginner to intermediate-level developers who want to leverage ROCm and HIP to program AMD GPUs and harness their parallel processing capabilities.

By the end of this training, participants will be able to:

Configure a development environment that includes the ROCm platform, an AMD GPU, and Visual Studio Code.
Develop a basic ROCm program that executes vector addition on the GPU and retrieves results from GPU memory.
Utilize the ROCm API to query device information, manage device memory allocation and deallocation, transfer data between host and device, launch kernels, and synchronize threads.
Employ the HIP language to write kernels that execute on the GPU and manipulate data.
Apply HIP built-in functions, variables, and libraries to perform common tasks and operations.
Optimize data transfers and memory accesses by leveraging ROCm and HIP memory spaces, including global, shared, constant, and local memory.
Control the threads, blocks, and grids that define parallelism using ROCm and HIP execution models.
Debug and test ROCm and HIP programs using tools such as the ROCm Debugger and ROCm Profiler.
Enhance the performance of ROCm and HIP programs through techniques such as coalescing, caching, prefetching, and profiling.

Course Format

Interactive lectures and discussions.
Extensive exercises and practical sessions.
Hands-on implementation within a live laboratory environment.

Course Customization Options

To request customized training for this course, please contact us to make arrangements.

This course is available as onsite live training in Kenya or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction

What is ROCm?
What is HIP?
Comparison: ROCm vs CUDA vs OpenCL
Overview of ROCm and HIP features and architecture
Setting up the Development Environment

Getting Started

Creating a new ROCm project using Visual Studio Code
Exploring the project structure and files
Compiling and running the program
Displaying the output using printf and fprintf

ROCm API

Understanding the role of the ROCm API in the host program
Using the ROCm API to query device information and capabilities
Using the ROCm API to allocate and deallocate device memory
Using the ROCm API to copy data between host and device
Using the ROCm API to launch kernels and synchronize threads
Using the ROCm API to handle errors and exceptions

HIP Language

Understanding the role of the HIP language in the device program
Using the HIP language to write kernels that execute on the GPU and manipulate data
Using HIP data types, qualifiers, operators, and expressions
Using HIP built-in functions, variables, and libraries to perform common tasks and operations

ROCm and HIP Memory Model

Understanding the difference between host and device memory models
Using ROCm and HIP memory spaces, such as global, shared, constant, and local
Using ROCm and HIP memory objects, such as pointers, arrays, textures, and surfaces
Using ROCm and HIP memory access modes, such as read-only, write-only, read-write, etc.
Using ROCm and HIP memory consistency models and synchronization mechanisms

ROCm and HIP Execution Model

Understanding the difference between host and device execution models
Using ROCm and HIP threads, blocks, and grids to define parallelism
Using ROCm and HIP thread functions, such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, etc.
Using ROCm and HIP block functions, such as __syncthreads, __threadfence_block, etc.
Using ROCm and HIP grid functions, such as hipGridDim_x, hipGridSync, cooperative groups, etc.

Debugging

Understanding common errors and bugs in ROCm and HIP programs
Using the Visual Studio Code debugger to inspect variables, breakpoints, call stacks, etc.
Using the ROCm Debugger to debug ROCm and HIP programs on AMD devices
Using the ROCm Profiler to analyze ROCm and HIP programs on AMD devices

Optimization

Understanding factors that affect the performance of ROCm and HIP programs
Using ROCm and HIP coalescing techniques to improve memory throughput
Using ROCm and HIP caching and prefetching techniques to reduce memory latency
Using ROCm and HIP shared memory and local memory techniques to optimize memory accesses and bandwidth
Using ROCm and HIP profiling tools to measure and improve execution time and resource utilization

Summary and Next Steps

Requirements

A solid understanding of C/C++ programming and parallel computing concepts.
Basic knowledge of computer architecture and memory hierarchy.
Experience using command-line tools and code editors.

Target Audience

Developers looking to learn how to use ROCm and HIP to program AMD GPUs and exploit their parallelism.
Developers aiming to write high-performance, scalable code that can run across various AMD devices.
Programmers interested in exploring the low-level aspects of GPU programming and optimizing code performance.

28 Hours

Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793

AMD GPU Programming Training Course

Course Outline

Requirements

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

AMD GPU Programming Training Course

Course Outline

Requirements

Related Courses

Developing AI Applications with Huawei Ascend and CANN

Deploying AI Models with CANN and Ascend AI Processors

AI Inference and Deployment with CloudMatrix

GPU Programming on Biren AI Accelerators

Cambricon MLU Development with BANGPy and Neuware

Introduction to CANN for AI Framework Developers

CANN for Edge AI Deployment

Understanding Huawei’s AI Compute Stack: From CANN to MindSpore

Optimizing Neural Network Performance with CANN SDK

CANN SDK for Computer Vision and NLP Pipelines

Building Custom AI Operators with CANN TIK and TVM

Migrating CUDA Applications to Chinese GPU Architectures

Performance Optimization on Ascend, Biren, and Cambricon

Related Categories

GPU

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites