Home Products Education CUDA and OpenACC

Saturday27 May 2017

CUDA and OpenACC Training



GPU-acceleration

 

There are 3 basic approaches to adding GPU acceleration to your applications.

But first, many applications are already GPU-accelerated.  So find out if the application you are interested in already uses GPUs.  

If you are developing your own application, you can add GPU acceleration by:
1. Dropping in pre-optimized GPU-accelerated libraries as an alternative to MKL, IPP, FFTW and other widely used libraries
2. Adding OpenACC directives (or compiler “hints”) in your source code to automatically parallelize loops
3. Using the programming languages you already know to implement your own parallel algorithms

OpenACC directives are extremely easy to use: just about 5% alteration of your code should give you 10x and above performance speedup! On the other hand - if you developing your application from scratch CUDA gives you maximum flexibility in coding.

We at NOVATTE are here to help you to easily adopt the GPU programming techniques that will help to accelerate your app straight away!  

For your convenience we run the below courses Online, so You can study anywhere at the place of your convenience. You study with the trainer at your monitor, in a fully interactive mode, and do the labs on the NODE. Training Online you not only study at the place of your convenience, but also get to know new friends and classmates from overseas and enjoy lucrative Online education rates.

 

OpenACC (2 -day)


Objectives:

This course presents accelerator 
directive-based programming using
OpenACC and extra directives for 
performances

 


Prerequisites:
Knowledge in C
Audience: Developers, Project Leader
OS: Linux
    




Day 1

Morning (9 am-12 pm) - CUDA Basics

• Introduction to GPU computing

• CUDA architecture and

   programming model

• CUDA API

• CUDA debugging

Afternoon (1pm-5pm) - OpenACC (1/2)

• OpenACC Overview & compilers

• OpenACC Programming Model

• Managing data with OpenACC 

 

 

 

Day 2

Morning (9am-12pm) - OpenACC (2/2)

• OpenACC loop constructs

• Asynchronism with OpenACC
• OpenACC runtime API

 

 


Afternoon (1pm-5pm) – Advanced Techniques

• Loop transformations
• Library integration

• OpenACC 2 new features
• Integrate handwritten kernels
    

 


CUDA & NVIDIA ARCHITECTURES (2 -day)



Objectives:

This course will expose you to the main features of CUDA, making you able to fully exploit the GPU and achieve high performances using tricks for your CUDA kernels.


Prerequisites:
Knowledge in C

Audience: Developers, Project Leaders

OS: Linux, Windows

 

 

 

Day 1

 

Morning (9am-12pm) – CUDA Basics

• Introduction to GPU computing

• CUDA architecture and programming model

• CUDA API

• CUDA debugging

 

Afternoon (1pm-6pm) – CUDA Kernel Performance (1/2)

• Using 2D CUDA grid for large computations

• CUDA warps

• Data alignment & coalescing 

 

 

Day 2

 

Morning (9am-12pm) – CUDA Kernel Performance (2/2)

• Texture memory & constant memory

• Shared memory

 

 

Afternoon (1pm-5pm) – CUDA Grids Optimization

• Maximizing occupancy

• Interpreting profiler counters

• CUDA performance tools: debuggers

• Libraries 


  CUDA ADVANCED (1 day)

 

Objectives:

This course will expose you to the trickiest features of CUDA, making you able to overlap GPU and CPU computations and achieve high performances using CUDA streams and multi-GPU.

Prerequisites: CUDA & Nvidia Architectures training session

Audience: Developers

OS: Linux, Windows  

Day 1

 

Morning (9am-12pm) – CUDA Streaming

• CUDA Data Transfer Optimizations

• Asynchronism and overlapping

• CUDA streams

• Transfer performance





Afternoon (1pm-6pm) – CUDA with MPI/OpenMP

• Parallel CUDA

• MPI introduction

• Mixing CUDA and MPI

• Multi-GPU computing with CUDA

  and MPI


CAPS Compiler Advanced Directives (1 day)

Objectives:
This course is dedicated to new features of HMPP, making you able to easily integrate specialized library calls, to use HMPP API routines for C++ applications and achieve high performances using multiple-GPU at the same time at high-level.

Prerequisites: HMPP training session
Audience: Developers

OS: Linux

Day 1

 

 (9am-6pm) 

• Caps Compiler API for C, C++ and Fortran Applications

• Calling specialized library primitives with HMPP

• Calling hand-written tuned codes or external functions in HMPP

  Codelets

• Computing on multiple GPUs with HMPP

• Loop Transformations using CapsCompiler directives 


 

 

 OpenCL (2-day)

 


Objectives:

This course will make you able to develop portable parallel applications. First it will expose you to the basis of OpenCL, making you able to easily write your own hybrid applications. Then it will introduce you to some OpenCL devices specific optimizations, making you able to fully exploit them and achieve high performances.

 

Prerequisites: Knowledge in C
Audience: Developers

OS: Linux

 

Day 1

 

Morning (9am-12pm) – OpenCL Basics

• Introduction to GPU computing

• GPU architecture

• OpenCL programming model

• OpenCL API

 

Afternoon (1pm-6pm) – OpenCL Kernel Performance (1/3)

• OpenCL Tools for compiling and debugging

• Performance measure of OpenCL           applications

Day 2

 

Morning (9am-12pm) – OpenCL Kernel Performance (2/3)

• Memory: Image Objects

• NDRange optimizations

• Local Memory

 

Afternoon (1pm-5pm) – OpenCL Kernel Performance (3/3)

• Code optimizations & transformations

• Asynchronism & queue ordering

• Hardware-dependent optimizations

 General Parallel Programming (2-day)


Objectives:

This course will introduce you to the basis of parallelism, making you able to fully exploit the various existing parallelisms of modern computer architectures, and making you ready to write your own MPI, OpenMP, or MPI/OpenMP applications.

Prerequisites: Knowledge in C
Audience: Developers, Project Leader
OS: Linux



 

 

 

 

Day 1

 

Morning (9am-12pm) – Basis of Parallelism, Cluster execution

• Parallel architectures

• Levels of parallelism

• Evaluating performance

• Parallel programming models

• Presentation of the application to optimize

 

 

Afternoon (1pm-6pm) – OpenMP Basics

• Introduction

• Threads with OpenMP

• Main directives

• The OpenMP environment

• Reductions

• Data sharing between threads

• OpenMP Tasks

• OpenMP Sections

 

Day 2

 

Morning (9am-12pm) –MPI Basics

• Introduction

• Point-to-point communications

• The MPI Environment

• Asynchronous communications

• Overlapping communications with

  computations

• Collective communications

 

Afternoon (1pm-5pm) – Advanced

• OpenMP vs. MPI vs. OpenMP+MPI

• Performance comparison

• Data Decomposition 




 

 

 Porting Methodology (coming soon) (2-day)

Objectives:

In this course, you analyze, set up, profile and develop porting projects to GPUs. You gain hands- on experience performing each stage of the development using all the core concepts and tools necessary to engineer a successful porting.

 

Prerequisites: Knowledge in C
Audience: Developers, Project Leader
OS: Linux
 

 

 

 

 

 

Day 1

 

Morning (9am-12pm) – Migration process

• Hardware/software goals

• Case study: Cost effectiveness of

  GPUs

• Parallel and GPU computing background

• Migration process 

 

Afternoon (1pm-6pm) – Profiling

• Introduction to profilers

• Determining speedup and objectives

• Parallelism discovery

• Go/No Go analysis

• Case study 1

• Case study 2

 

 

 

Day 2

 

Morning (9am-12pm) – Portability goals

• Directives

 

 

 

 

 

 

Afternoon (1pm-5pm) – Optimizing GPGPU codes

• Advanced GPU performance

 

 

 





 

JA Minisite

2857.orig.q75.o0 - Copy IntelTPP amd  tesla preferred partner  quadro partner  qctlogo e   Mellanox APAC Partner   1