Fundamentals of Accelerated Computing with Modern CUDA C++

Fundamentals of Accelerated Computing with Modern CUDA C++ (FACCC) – Outline

Detailed Course Outline

Introduction

Meet the instructor.
Create an account at courses.nvidia.com/join

CUDA Made Easy: Accelerating Applications with Parallel Algorithms

To make your first steps in GPU programming as easy as possible, this lab teaches you how to leverage powerful parallel algorithms that make GPU acceleration of your code as easy as changing a few lines of code. While doing so, you’ll learn fundamental concepts such as execution space and memory space, parallelism, heterogeneous computing, and kernel fusion. These concepts will serve as a foundation for your advancement in accelerated computing. By the time you complete this lab, you will be able to:

Write, compile, and run GPU code
Refactor standard algorithms to execute on GPU
Extend standard algorithms to fit your unique use cases

Break (60 mins)

Unlocking the GPU’s Full Potential: Harnessing Asynchrony with CUDA Streams

In the previous lab, you learned how to use parallel algorithms. However, But the concept of parallelism is not sufficient for accelerating your applications. To fully utilize GPUs, this lab will teach you another fundamental concept: asynchrony. In this lab, you'll learn how and when to leverage asynchrony. You’ll use Nsight Systems to distinguish synchronous and asynchronous algorithms and identify performance bottlenecks. By the time you complete this lab, you will be able to:

Use CUDA streams to overlap execution and memory transfers
Use CUDA events for asynchronous dependency management
Profile CUDA code with NVIDIA Nsight Systems

Break (15 mins)

Implementing New Algorithms with CUDA Kernels

Previous labs equipped you with necessary understanding of how using standard parallel algorithms can provide both convenient and speed-of-light GPU acceleration. However, sometimes your unique use cases are not covered by accelerated libraries. In this lab, you’ll learn the CUDA SIMT programming model to program the GPU directly using CUDA kernels. Besides that, this lab will cover utilities provided by the CUDA ecosystem to facilitate development of custom CUDA kernels. By the time you complete this lab, you will be able to:

Write and launch custom CUDA kernels
Control thread hierarchy
Leverage shared memory
Use cooperative algorithms

Final Review

Review key learnings and wrap up questions.
Complete the assessment to earn a certificate.
Take the workshop survey.

Price

Price

Fundamentals of Accelerated Computing with Modern CUDA C++ (FACCC) – Outline

Detailed Course Outline