Why Requirements-Based mostly Parallel Programming Needs to be in Your HPC Toolbox

HPC utility builders have lengthy relied on programming abstractions that have been developed and used virtually completely inside the realm of conventional HPC. OpenMP was created greater than 25 years in the past to simplify shared-memory parallel computing as a result of programming languages of the day had few to no such options and distributors have been growing their very own, incompatible abstractions for symmetric multiprocessing.

CUDA C was designed and launched by NVIDIA in 2007 as extensions to the C language to help programming massively parallel GPUs, once more as a result of the C language lacked the mandatory options to help parallelism instantly. Each of those programming fashions have been extremely profitable as a result of they supply the mandatory abstractions to beat the shortcomings of the languages that they prolonged in a user-friendly method.

The panorama has modified rather a lot, nonetheless, within the years since these fashions have been initially launched and it’s time to reevaluate the place they need to slot in a programmer’s toolbox. On this put up I focus on why you need to be parallel programming natively with ISO C++ and ISO Fortran.

Parallel Programming is Turning into the Customary

Parallel programming was as soon as a distinct segment subject reserved just for authorities labs, analysis universities, and sure forward-looking industries, however at present it’s a requirement for all industries. Due to this, mainstream programming languages now help parallel programming natively, and an rising variety of developer instruments help these options. It’s now doable for purposes to be developed to help parallelism from the beginning, without having for a serial baseline code.

Such parallel-first codes will be taken to any pc system, whether or not it’s based mostly on multi-core CPUs, GPUs, FPGAs, or another novel processor we haven’t considered but, and be anticipated to run on day one. This frees builders from the necessity to port purposes to new programs and allows them to concentrate on productively optimizing their utility or increasing its capabilities as an alternative.

Why Requirements-Based mostly Parallel Programming Needs to be in Your HPC Toolbox
NVIDIA gives three composable approaches to parallel programming: accelerated customary languages, moveable directives-based options, and platform particular options. This offers builders decisions to optimize their efforts in line with their productiveness, portability, and efficiency targets.

NVIDIA gives three approaches to programming for our platform, all of that are layered on the muse of our decades-long funding in accelerated libraries and compilers. All of those approaches are absolutely composable, giving the programmer the selection of learn how to finest steadiness their productiveness, portability, and efficiency targets.

ISO Languages Obtain Efficiency and Portability

New utility improvement ought to be carried out utilizing ISO customary programming languages and the parallel options they supply. There is no such thing as a higher instance of moveable programming fashions than the ISO languages, so builders ought to anticipate that purposes written to those requirements will run wherever. Lots of the builders we’ve labored with have discovered that the efficiency features from refactoring their purposes utilizing standards-based parallelism in C++ or Fortran are already nearly as good as or higher than their current code.

Some builders have elected to carry out additional optimizations by introducing moveable directives, OpenACC, or OpenMP, to enhance knowledge motion or asynchrony and acquire even greater efficiency. This ends in utility code that’s nonetheless absolutely moveable and high-performance. Builders who wish to get hold of the best efficiency in key elements of their purposes could select to take the extra step of optimizing parts of the appliance with a lower-level method, corresponding to CUDA, and benefit from every thing the {hardware} has to supply. And, after all, all of those approaches work together properly with our expert-tuned accelerated libraries.

Increasing the Requirements to Leverage Improvements

There’s a false impression within the trade that CUDA is the language utilized by NVIDIA to lock-in customers, however in actual fact it’s our language for innovating and exposing the options of our {hardware} most instantly. CUDA C++ and Fortran are in some ways co-design languages, the place we are able to expose {hardware} improvements and iterate on the programming mannequin rapidly. As finest practices are developed within the CUDA programming mannequin, we imagine they will and ought to be codified in requirements.

As an example, because of the successes of our prospects in using mixed-precision arithmetic, we labored with the C++ committee to standardize prolonged floating level sorts in C++23. Thanks in a big half to the work of our math libraries group, now we have labored with the group to  suggest a C++ extension for a standardized linear algebra interface that may map nicely to not solely our libraries however community-based and proprietary libraries from different distributors as nicely. We attempt to enhance parallel programming and asynchrony within the ISO customary languages as a result of it’s one of the best factor for our prospects and the group at giant.

What Do Builders Assume?

Professor Jonas Latt on the College of Geneva makes use of nvc++ and the C++ parallel algorithms within the Pallabos library and mentioned that, “The end result produces state-of-the-art efficiency, is extremely didactical, and introduces a paradigm shift in cross-platform CPU/GPU programming in the neighborhood.”

Dr. Ron Caplan of Predictive Science Inc. mentioned of his expertise utilizing nvfortran and Fortran Do Concurrent, “I can now write far fewer directives and nonetheless anticipate excessive efficiency from my Fortran purposes.”

And Simon McIntosh-Smith from the College of Bristol mentioned when presenting his group’s outcomes utilizing nvc++ and parallel algorithms, “The ISO C++ variations of the code have been easier, shorter, simpler to jot down, and will  be simpler to take care of.”

These are just some of the builders already reaping the rewards of utilizing standards-based parallelism of their improvement.

Requirements-Based mostly Parallel Programming Assets

NVIDIA has a spread of assets that will help you fall in love with standards-based parallelism.

Our HPC Software Development Kit (SDK)  is a free software program package deal that features:

  • NVIDIA HPC compilers for C, C++, and Fortran
  • The CUDA NVCC Compiler
  • An entire set of accelerated math libraries, communication libraries, and core libraries for knowledge constructions and algorithms
  • Debuggers and profilers

The HPC SDK is freely obtainable on x86, Arm, and OpenPOWER platforms, no matter whether or not you personal an NVIDIA GPU, and is even Amazon’s HPC software program stack for Graviton3.

NVIDIA On-Demand additionally has a number of related recordings to get you began (attempt “No More Porting: Coding for GPUs with Standard C++, Fortran, and Python”), in addition to our posts on the NVIDIA Developer Blog.

Lastly, I encourage you to register for GTC Fall 2022, the place you’ll discover much more talks about our software program and {hardware} choices, together with extra data on standards-based parallel programming.

Jeff Lark, Principal HPC Application Architect at NVIDIA

About Jeff Larkin

Jeff is a Principal HPC Software Architect in NVIDIA’s HPC Software program group. He’s passionate concerning the development and adoption of parallel programming fashions for Excessive Efficiency Computing. He was beforehand a member of NVIDIA’s Developer Expertise group, specializing in efficiency evaluation and optimization of excessive efficiency computing purposes. Jeff can be the chair of the OpenACC technical committee and has labored in each the OpenACC and OpenMP requirements our bodies. Earlier than becoming a member of NVIDIA, Jeff labored within the Cray Supercomputing Heart of Excellence, situated at Oak Ridge Nationwide Laboratory.

Why Standards-Based Parallel Programming Should be in Your HPC Toolbox