Detailed Course Outline
Introduction
- Meet the instructor.
- Create an account at courses.nvidia.com/join
CUDA Made Easy: Accelerating Applications with Parallel Algorithms
To make your first steps in GPU programming as easy as possible, this lab teaches you how to leverage powerful parallel algorithms that make GPU acceleration of your code as easy as changing a few lines of code. While doing so, you’ll learn fundamental concepts such as execution space and memory space, parallelism, heterogeneous computing, and kernel fusion. These concepts will serve as a foundation for your advancement in accelerated computing. By the time you complete this lab, you will be able to:
- Write, compile, and run GPU code
- Refactor standard algorithms to execute on GPU
- Extend standard algorithms to fit your unique use cases
Break (60 mins)
Unlocking the GPU’s Full Potential: Harnessing Asynchrony with CUDA Streams
In the previous lab, you learned how to use parallel algorithms. However, But the concept of parallelism is not sufficient for accelerating your applications. To fully utilize GPUs, this lab will teach you another fundamental concept: asynchrony. In this lab, you'll learn how and when to leverage asynchrony. You’ll use Nsight Systems to distinguish synchronous and asynchronous algorithms and identify performance bottlenecks. By the time you complete this lab, you will be able to:
- Use CUDA streams to overlap execution and memory transfers
- Use CUDA events for asynchronous dependency management
- Profile CUDA code with NVIDIA Nsight Systems
Break (15 mins)
Implementing New Algorithms with CUDA Kernels
Previous labs equipped you with necessary understanding of how using standard parallel algorithms can provide both convenient and speed-of-light GPU acceleration. However, sometimes your unique use cases are not covered by accelerated libraries. In this lab, you’ll learn the CUDA SIMT programming model to program the GPU directly using CUDA kernels. Besides that, this lab will cover utilities provided by the CUDA ecosystem to facilitate development of custom CUDA kernels. By the time you complete this lab, you will be able to:
- Write and launch custom CUDA kernels
- Control thread hierarchy
- Leverage shared memory
- Use cooperative algorithms
Final Review
- Review key learnings and wrap up questions.
- Complete the assessment to earn a certificate.
- Take the workshop survey.