CUDA and OpenACC Training
There are 3 basic approaches to adding GPU acceleration to your applications.
But first, many applications are already GPU-accelerated. So find out if the application you are interested in already uses GPUs.
If you are developing your own application, you can add GPU acceleration by:
1. Dropping in pre-optimized GPU-accelerated libraries as an alternative to MKL, IPP, FFTW and other widely used libraries
2. Adding OpenACC directives (or compiler “hints”) in your source code to automatically parallelize loops
3. Using the programming languages you already know to implement your own parallel algorithms
OpenACC directives are extremely easy to use: just about 5% alteration of your code should give you 10x and above performance speedup! On the other hand - if you developing your application from scratch CUDA gives you maximum flexibility in coding.
We at NOVATTE are here to help you to easily adopt the GPU programming techniques that will help to accelerate your app straight away!
For your convenience we run the below courses Online, so You can study anywhere at the place of your convenience. You study with the trainer at your monitor, in a fully interactive mode, and do the labs on the NODE. Training Online you not only study at the place of your convenience, but also get to know new friends and classmates from overseas and enjoy lucrative Online education rates.
OpenACC (2 -day)
|
Day 1 Morning (9 am-12 pm) - CUDA Basics • Introduction to GPU computing • CUDA architecture and programming model • CUDA API • CUDA debugging Afternoon (1pm-5pm) - OpenACC (1/2) • OpenACC Overview & compilers • OpenACC Programming Model • Managing data with OpenACC
|
Day 2 Morning (9am-12pm) - OpenACC (2/2) • OpenACC loop constructs • Asynchronism with OpenACC
• Loop transformations • OpenACC 2 new features
|
CUDA & NVIDIA ARCHITECTURES (2 -day)
This course will expose you to the main features of CUDA, making you able to fully exploit the GPU and achieve high performances using tricks for your CUDA kernels.
Audience: Developers, Project Leaders OS: Linux, Windows
|
Day 1
Morning (9am-12pm) – CUDA Basics • Introduction to GPU computing • CUDA architecture and programming model • CUDA API • CUDA debugging
Afternoon (1pm-6pm) – CUDA Kernel Performance (1/2) • Using 2D CUDA grid for large computations • CUDA warps • Data alignment & coalescing
|
Day 2
Morning (9am-12pm) – CUDA Kernel Performance (2/2) • Texture memory & constant memory • Shared memory
Afternoon (1pm-5pm) – CUDA Grids Optimization • Maximizing occupancy • Interpreting profiler counters • CUDA performance tools: debuggers • Libraries |
CUDA ADVANCED (1 day)
Objectives: This course will expose you to the trickiest features of CUDA, making you able to overlap GPU and CPU computations and achieve high performances using CUDA streams and multi-GPU. Prerequisites: CUDA & Nvidia Architectures training session Audience: Developers OS: Linux, Windows |
Day 1
Morning (9am-12pm) – CUDA Streaming • CUDA Data Transfer Optimizations • Asynchronism and overlapping • CUDA streams • Transfer performance |
Afternoon (1pm-6pm) – CUDA with MPI/OpenMP • Parallel CUDA • MPI introduction • Mixing CUDA and MPI • Multi-GPU computing with CUDA and MPI |
CAPS Compiler Advanced Directives (1 day)
Objectives: This course is dedicated to new features of HMPP, making you able to easily integrate specialized library calls, to use HMPP API routines for C++ applications and achieve high performances using multiple-GPU at the same time at high-level. Prerequisites: HMPP training session OS: Linux |
Day 1
(9am-6pm) • Caps Compiler API for C, C++ and Fortran Applications • Calling specialized library primitives with HMPP • Calling hand-written tuned codes or external functions in HMPP Codelets • Computing on multiple GPUs with HMPP • Loop Transformations using CapsCompiler directives
|
OpenCL (2-day)
Objectives: This course will make you able to develop portable parallel applications. First it will expose you to the basis of OpenCL, making you able to easily write your own hybrid applications. Then it will introduce you to some OpenCL devices specific optimizations, making you able to fully exploit them and achieve high performances.
Prerequisites: Knowledge in C OS: Linux
|
Day 1
Morning (9am-12pm) – OpenCL Basics • Introduction to GPU computing • GPU architecture • OpenCL programming model • OpenCL API
Afternoon (1pm-6pm) – OpenCL Kernel Performance (1/3) • OpenCL Tools for compiling and debugging • Performance measure of OpenCL applications |
Day 2
Morning (9am-12pm) – OpenCL Kernel Performance (2/3) • Memory: Image Objects • NDRange optimizations • Local Memory
Afternoon (1pm-5pm) – OpenCL Kernel Performance (3/3) • Code optimizations & transformations • Asynchronism & queue ordering • Hardware-dependent optimizations |
General Parallel Programming (2-day)
This course will introduce you to the basis of parallelism, making you able to fully exploit the various existing parallelisms of modern computer architectures, and making you ready to write your own MPI, OpenMP, or MPI/OpenMP applications. Prerequisites: Knowledge in C
|
Day 1
Morning (9am-12pm) – Basis of Parallelism, Cluster execution • Parallel architectures • Levels of parallelism • Evaluating performance • Parallel programming models • Presentation of the application to optimize
Afternoon (1pm-6pm) – OpenMP Basics • Introduction • Threads with OpenMP • Main directives • The OpenMP environment • Reductions • Data sharing between threads • OpenMP Tasks • OpenMP Sections
|
Day 2
Morning (9am-12pm) –MPI Basics • Introduction • Point-to-point communications • The MPI Environment • Asynchronous communications • Overlapping communications with computations • Collective communications
Afternoon (1pm-5pm) – Advanced • OpenMP vs. MPI vs. OpenMP+MPI • Performance comparison • Data Decomposition
|
Porting Methodology (coming soon) (2-day)
Objectives: In this course, you analyze, set up, profile and develop porting projects to GPUs. You gain hands- on experience performing each stage of the development using all the core concepts and tools necessary to engineer a successful porting.
Prerequisites: Knowledge in C
|
Day 1
Morning (9am-12pm) – Migration process • Hardware/software goals • Case study: Cost effectiveness of GPUs • Parallel and GPU computing background • Migration process
Afternoon (1pm-6pm) – Profiling • Introduction to profilers • Determining speedup and objectives • Parallelism discovery • Go/No Go analysis • Case study 1 • Case study 2
|
Day 2
Morning (9am-12pm) – Portability goals • Directives
Afternoon (1pm-5pm) – Optimizing GPGPU codes • Advanced GPU performance
|