CAPS Compilers for CUDA and OpenCL

Based on the directive-based OpenACC and OpenHMPP standards, CAPS compilers enable developers to incrementally build portable applications for various many-core systems such as NVIDIA and AMD GPUs, and Intel Xeon Phi.
CAPS-structure

The source-to-source CAPS compilers integrate powerful data parallel code generators that produce CUDA or OpenCL code. CAPS compilers rely on your original CPU compiler to produce the host application binary and the hardware vendor compilers to produce the binary of the accelerated parts of the application.

Caps-pricing


7 Reasons for CAPS Compilers

- Enjoy the power of GPUs
- Preserve your software assets
- Reduce risk and cost using an incremental approach
- Ensure interoperability with an open standard
- Benefit from an affordable solution
- Intelligible CUDA or OpenCL code
- Keep your CPU compiler 

CAPS box


Supported platforms and compilers

Linux System Windows System
Supported Compilers: Supported OSes: Supported Compilers: Supported OSes:

Intel 11.1+

GNU gcc 4.1+

GNU gfortran 4.3+

Open64 4.2+

Absoft Pro Fortran 11.0+

Debian 5.0+             

RedHat Entreprise Linux 5.3+

OpenSuse 11.0+

SLES 11.0+

Ubuntu 9.10

Microsoft Visual Studio 2010-2012

INTEL COMPOSER XE 2011-2013

Windows Server 2008 R2

Windows Server 2012

Windows 7



BAS - Backup and Archive Storage



NOVATTE BAS is a low-cost HDD-based backup and archiving storage alternative to traditional Tape Library storage systems.

BAS Benefits
Price: up to 46% < traditional Tape Library Storage
Performance: instant data access, faster backup and recovery time
Density: up to 1.6PB usable space in 42U rack
Data protection: RAID 60 with optional configurations, spare drives for immediate rebuilt

BAS Architecture
BAS-rack      BAS-536-architecture

BAS XFS Front Server

XFS is scalable high performance journaling file system which is very proficient at handling large files.
All volumes from the storage array and its JBODs are visible in BAS XFS Front Server and each is formatted into XFS filesystem. Each server has 9 or more XFS volumes, approximately 60TB each  (different options are possible, for instance a single XFS file system with combining all volumes with LVM).


BAS 715TB / 536TB
BAS 715TB consists of 4 modules and BAS 536TB consists of 3 single-controller modules.
Each module consists of 3 x RAID 60 blocks.
Each RAID 60 blocks consists of two striped RAID6 groups.
Two hot spare disks are added to each BAS module to provide instant HDD rebuild in case of disk failure.


BAS has two Storage module types

        BAS 715 BAS 536
       BAS-715  BAS-536
Rack units

16U w/o Front Server
18U w/o Front Server

12 w/o Front Server
14 w/o Front Server
Usable Capacity (TB) 715TB 536TB
Number of controllers Standard: One controller, 4GB Cache
Optional: Redundant controllers, 4GB Cache per controller
Network uplinks Option 1: 2 x 6Gb SAS
Option 2: 4 x 8Gb Fibre Channel per controller
Warranty Standard: 1 Year, Next Business Day
Optional: upgrade to 3 Year, Next Business Day

 

 

NLA - NOVATTE Lustre Storage Appliance

Developed by
Novatte New Logo3


100% certified and supported by
Intel logo  Whamcloud-logo

In an HPC environment, compute nodes typically require a common file system for applications and tools. 
Lustre is an open standard distributed parallel file system which was purposely developed for HPC Clusters and Big Data storage systems working under Linux OS and which provides a common file system for applications and tools. 

 

Lustre is extremely scalable, easily can accommodate thousands of users and hundreds of storage servers, has high degree of performance, flexibility, stability and network efficiency. And it is Open Source – which means no vendor lock-in.

 

NLA is end-to-end Lustre Storage Appliance purposely build together by NOVATTE and Intel to ease Lustre adoption among Big Data and HPC users and community. It is high-performance, scalable, cost-effective appliance that is easy to deploy, configure and manage and comes with GUI, comprehensive Level 1 / Level 2 / Level 3 Lustre support and optional 4-days Lustre Storage Administration Training course.


NLA-pitch

NLA Performance results:
- File I/O as a percent of raw bandwidth:  ~90%
Achieved single storage module I/O:  >6 GB/sec
Achieved single client I/O:  >3 GB/sec
- Achieved aggregate I/O:  1024 GB/sec
- Metadata transaction rate:  24,000 ops/sec
- Maximum file size:  2 PB
Maximum file system size: > 512 PB
Maximum number of clients:  > 50,000
Gigantic amount of files: > 10 million files in one directory / 4 billion files in the file system.

NLA DEMO video: please enable full screen and change the video resolution to 1080p for better viewing



NLA Benefits:

Open standard: no vendor lock-in associated problems.
- Intuitive GUI: easy to use for any System Administrator.
Fully tested and certified by Whamcloud and Intel.
Very small files support: SSD-based X-series NLA modules for higher IOPS and for files <64K.
Sequential IO: continuous write-write-write-write pattern vs seek-write-seek-write common to NFS.
Linear performance scalability and capacity growth: each additional NLA storage module linearly scales performance and capacity without any interruption.
- High Availability (HA): High Availability managers allow automated failover and no single point of failure (NSPF).
Native 10GbE and InfiniBand support: no additional gateways required.
Native integration with NOVATTE Xeon Phi-based, GPU-based and CPU-pased HPC clusters: same HW architecture, interconnect, warranty and support.
Best in class Lustre technical support: NLA support is guaranteed by NOVATTE and Intel® Corporation! Be assured that original developers of Lustre file system will give you a full support in highly unlikely event of technical issue.

 

NLA Architecture
NLA has two major modules: Base module and Storage modules:
NLA-architecture


Base Module
NLA Base module is the heart and foundation of Lustre file system. Base module contains Metadata server pair (MDS) and initial Storage module (OSS server pair with storage). Metadata server pair orchestrates file system activity outside of the data path and is not involved in any file I/O operations, avoiding I/O scalability bottlenecks and allowing reads/writes to occur in parallel directly between compute clients and Storage modules. Such design allows to scale performance and capacity linearly simply by adding additional Storage modules to the system.

 

Storage Modules
NLA Storage modules contain all applications and user data, storing files as objects for optimal performance and reliability. Storage module contains OSS server pair with storage. Each Storage module provides up to 2.5GB/s of read and write throughput. You can linearly increase NLA bandwidth and capacity simply by adding additional Storage modules. 

NLA has three different Storage module types:

X-series L-series M-series
X-module Lustre copy L-module Lustre copy M-module Lustre copy

Ideal for:

High IOPS and Small files

Large to Extra Large deployments

Small to Medium deployments
Technology  Enterprise SSD HDD HDD
Usable Capacity (TB):

Option 1: 12TB
Option 2: 6TB

Option 1: 178 TB
Option 2: 134TB

Option 1: 119TB
Option 2: 89TB

Max Throughput,
Read /Write (GB/sec) 
3.0/3.0 2.5/2.0 2.0/1.8
Memory Cache (GB) 128 128 128
Rack units 4U 6U 6U
Network uplinks Native: FDR InfiniBand / QDR InfiniBand / 10GbE SFP+
Lustre support
and Warranty

Standard: 1 Year of L1/L2/L3 Lustre Support
Optional: upgrade to 3 Years of L1/L2/L3 Lustre Support 

NLA-modules-positioning

NLA Storage Modules Upgrades:

NLA Storage modules upgrades are available so you don’t have to worry if you start with small and affordable Lustre deployment and later on decide to scale it up.

 

High Availability (HA)
NLA has no single point of failure (NSPF). 
- Base module and Storage modules servers work in redundant pairs in High Availability (HA) mode with Shot The Other Node In The Head protection:

     * Lustre builds in multi mount protection (MMP) to prevent two servers writing to the same targets.

     * The HA configuration eliminates the possibility of “Nodes shooting each other”. 
- All modules have redundant controllers and power supplies.
- Each Storage module consists of multiple small data blocks protected by RAID6 to maximize data availability.
- Modular structure of NLA allows easy and fast servicing in highly unlikely event of component malfunctioning.
- All NLA modules were thoroughly checked and tested by Whamcloud and Intel engineering teams to guarantee High Availability (HA) and performance figures.


Easy Management
NLA has a Lustre Central Management System (LUSIS) that is deeply integrated with Lustre. It brings together information from multiple sources to provide a unified view of what is going on in a storage system ― while vastly simplifying installation, configuration, maintenance, monitoring, and fault diagnosis. NLA central management system was jointly developed by NOVATTE and Intel.

LUSIS detects all NLAs’ shared devices and presents them in live intuitive graphs for real-time monitoring. Lustre’s command line interface is still available for power users, while the scriptable API enables enterprise scale automation.

Real-time system monitoring lets you:
- Monitor storage health and KPIs in real-time
- View high-level NLA performance or individual modules details
- Generate historical and real-time reports

Advanced troubleshooting tools provide:
- A centralized view of storage-related logs
- Intelligent log scanning for efficient debugging and analysis
- Repeatable self-test performance metrics
- Configurable event notifications

Browser-based administration provides:
- Point-and-click simplicity for NLA configuration and management – You can configure your Lustre system in minutes.
- Intuitive, rich interface – Lets you monitor and manage the whole NLA in a single view. 
- Centralized definition and management of common tasks – Your company spends less on training and use of skilled resources to manage the SLA parallel file system storage.

Automated updates provide: 
- Automated patches and security fixes 
- Protection against external threats 
- Reduced need for manual intervention and downtime

 

NLA Lustre Support
NLA comes with comprehensive high-quality Level 1 / Level 2 / Level 3 Lustre support from NOVATTE fully backed by engineers from Whamcloud/Intel.

If the work of your organization is important and time-critical, you will want the foremost Lustre experts standing behind you to minimize the impact to your operations of any disruption – whether due to a hardware failure, administrator error, or a software bug.

Although Lustre has a vibrant user community, for sites with stringent operational requirements, nothing beats support from expert Lustre engineers. You can find that expertise at a joint NOVATTE/Whamcloud/Intel team from the engineers who create and implement most of the new features and bug fixes landed to Lustre.


NLA Support Program Comparison Table

   Warranty    Silver   Gold   Platinum
H/W Warranty and Lustre support  1 Year  3 Years  3 Years  3 Years
On-site Installation P P P P
Telephone  and email support during business hours (8x5) P P P P
Free depot repair of returned components P P P P 
 
Advance shipment of replacement parts
within 2 business days of problem determination
 P  P P 
Email technical support 24x7x365  P  P  P
 
Telephone technical support 24x7x365  P  P
Automated system monitoring on request  P  P
Lustre Maintenance Releases Upgrades Installation  P  P
On-site parts delivery by next business day  P  P
On-site engineer by next business day  P  P
 
Event priority based queuing  
 P
On-site Lustre Feature Releases Upgrades Installation  
 P
On-site parts delivery within 4 hrs, 24x7x365  
 P
On-site engineer within 4 hrs, 24x7x365  
 P
Black Hole +2% of net +2% of net +2% of net +2% of net

 

Black Hole  
Black Hole is a secure installation site where data recording components (such as HDDs, SSDs, etc.) go in and do not come out as they have to be destroyed on site. Normally such customers require purchasing spare disk drives on additional expense to change the faulty ones, but NLA Black Hole customers are entitled for data recording components warranty replacement even without the need to return the faulty components to NOVATTE.

 

G10 - NVIDIA Tesla GPU Cluster Appliance

 

X10 Nvidia-Tesla

Accelerate your research
To publish papers faster
And make yourself available for the next round of Government funding


G10-Base Performance Benchmarks
We used publically available models to benchmark G10 HPC Cluster. You can use same models to benchmark your current system vs NOVATTE G10:

Application Performance Benchmark model
GROMACS 43.55 ns/day
ADH Cubic 
(download from gromacs.org)
LAMMPS 209.7 sec data.rhodo and in.gpu.rhodo with steps changed to 5000
(inside <LAMMPS>/examples/gpu/)
NAMD 485 sec STMV with numsteps set to 5000 
(download from University of Illinois at Urbana-Champaign)

 

Ideal for: 

Molecular Dynamics: AMBER, GROMACS, NAMD, LAMMPS, CHARMM, DL_POLY, etc.
Quantum Mechanics: GAMESS, NWChem, CP2K, Quantum Espresso, VASP, Gaussian, etc.
Bioinformatics: Velvet, ABySS, SOAP, Allpaths-LG, MIRA3, TopHat, BLAST, GATK, Bowtie, BWA, SHRiMP, etc.


Get started:
Register for the NVIDIA Tesla GPU Cluster Test Drive
SIGNUPNOW


 

G10 Tech specs: Base Quarter Half Full
G10 Cluster Performance (TFlops) 5.6 11 21.7 43.1
G10 Initial Cluster storage space (TB) 50 50  50  50 
G10 Power consumption (kW) 2.6 4.6  8.7  16.6 
G10 Height (RU) 8  10  14  22 

G10 – Scientific HPC Appliance

X10

 

Accelerate your research!

→ To publish more papers
→ And win the next round of Government funding

 



Faster science
– hours vs weeks
Preinstalled accelerated Apps – simply power on and start working

HPC administration support by NOVATTE – focus purely on Science

 

 

 

G10 is a scientific HPC appliance that provides the maximum performance for MD, QM and Bioinformatics apps. Modular in nature, G10 can be configured to efficiently run any mix of applications required by particular Department or School: your workflow dictates your application mix, G10 guarantees your applications are run at their respective maximum speeds. 

G10-apps

 

Molecular Dynamics:

Application

Block

Performance

Benchmark

Download benchmark

AMBER Gamma-A 23.89 ns/day/block FactorIX NVE 
taken from AMBER.org
download
NAMD Gamma-A 485 sec/block STMV with numsteps set to 5000
taken from University of Illinois at Urbana-Champaign
download
GROMACS Gamma-A 42.22 ns/day/block ADH Cubic
taken from gromacs.org
download
CHARMM Gamma-A Coming soon  Coming soon   download
LAMMPS Gamma-A 209.7 sec/block data.rhodo and in.gpu.rhodo (steps changed to 5000)
inside <LAMMPS>/examples/gpu/
comes inside LAMMPS

DL_POLY Gamma-A Coming soon   Coming soon  download

 

Quantum Mechanics:

Application

Block

Performance

Benchmark

Download benchmark

GAMESS Gamma-A Coming soon Coming soon download
NWChem Gamma-A Coming soon Coming soon download
CP2K Gamma-A Coming soon Coming soon download
Quantum
Espresso
Gamma-A Coming soon Coming soon
download 

  

Bioinformatics:

Application

Block

Performance

Benchmark

Download benchmark

Velvet Delta Coming soon Coming soon download
BWA Delta Coming soon Coming soon download
Bowtie Delta Coming soon Coming soon download
SOAP3-dp Gamma-A Coming soon Coming soon
download
ABySS Delta Coming soon Coming soon download

 

Cluster Management:
G10 comes with intuitive and easy to use Cluster Management System with GUI: it offers a single and consistent interface to all available cluster management functionalities. Is extremely easy to use even for people with limited Linux management experience.

CMS1

 

G10 Compute Blocks:

  Gamma-A Gamma-B Delta Omega
  Gamma-a

Gamma-b
Delta Omega
Ideal for Molecular Dynamics
(Moderate
GPGPUs workload)
Molecular Dynamics
(Intensive
GPGPUs workload)

Quantum Mechanics
Bioinformatics
Large memory tasks
Non-MPI tasks
Architecture CPUs + GPGPUs + IB CPUs + GPGPUs + IB CPUs + IB CPUs + IB / 10GbE
Theoretical Peak TFLOPs per block (SP) 35.9 TFlops 70.8 TFlops 3.84 TFlops  1.33 TFlops
Theoretical Peak TFLOPs per block (DP) 8.4 TFlops 15.9 TFlops 1.92 TFlops 0.66 TFlops
Memory per block  256 GB 256 GB  512 GB 1 TB

 

 

G10 Storage:

  Scratch drive Archive space
  Scratch ScratchScratch
Performance up to 3 GB/s   
Usable Capacity 3.5 TB or 7TB Vary

 

Software Stack:
G10 support the following software stack
G10 SW Stack

 

 

Architecture:

Architecture
  • Separate data, monitoring and management networks
  • Driveless nodes
Compute field
  • Intel® Xeon® processor E5 family
  • NVIDIA TESLA GPGPU accelerators
Memory
  • DDR4-2133 GHz ECC Registered
Networking/
Interconnect
  • 1GbE management network
  • 56Gb/s FDR or 100Gb/s EDR InfiniBand interconnect network
  • NVIDIA GPUDirect
File System
  • NFS
  • Lustre (optional)
Storage Systems
System Administration
  • Graphical or shell system administration and provisioning for both HPC Cluster and Lustre storage
  • Automatic interconnect, nodes and storage detection and monitoring
  • Cluster partitioning to smaller logical clusters running own software stack and own jobs
  • Automatic system software updates; rollback capabilities
  • Visual tools for comprehensive set of software and hardware metrics monitoring
  • Redundant head nodes with automated failover and load balancing
  • Integrated job schedulers
  • GPU and MIC management
  • ScaleMP management
  • Graphical monitoring and provisioning of Parallel File System, storage

JA Minisite

2857.orig.q75.o0 - Copy IntelTPP amd    qctlogo e   Mellanox APAC Partner   1