Qlustar MPI

To achieve scalable compute performance on an HPC cluster, most applications use MPI parallelization. Qlustar MPI provides optimized OS packages of OpenMPI, the most widely used open-source MPI implementation.

Our packages are created such that different major OpenMPI versions compiled with different compilers (gcc, icc, etc.) can be installed in parallel. Hence, users have a choice of MPI versions to use for their programs and have a fall-back, in case an updated version doesn’t work for them initially. Due to the way the packages are created, applications compiled with Qlustar OpenMPI will always automatically and correctly resolve all system and MPI library dependencies without setting environment variables. This assures that old binaries will continue to run even after system updates, as long as the packages used to build them remain installed.

On CentOS nodes, rather than providing our own packages, we make the vast number of MPI packages provided by OpenHPC readily available.

Qlustar MPI Features

The MPI variants provided by Qlustar are optimally integrated with other HPC components like workload managers, GPU toolkits and specific communication libraries. Among others, they feature:

  • Integration with the Slurm workload manager.
  • Integration with Nvidia CUDA to support GPU computing.
  • Full-featured support for Infiniband networks.
  • Full-featured support for OmniPath networks.

Qlustar MPI Performance

  • Due to its ready-to-use support for modern IB/OPA network technology, Qlustar MPI achieves highest throughput as well as lowest message latency to the extent supported by the cluster IB/OPA hardware in use.
  • The compiled-in OpenMPI CUDA-aware feature allows sending and receiving CUDA GPU buffer memory directly without staging them through host memory. This results in significant performance improvements for applications combining GPU and distributed MPI computing.
  • GPUDirect support is enabled for nodes with Mellanox IB cards. This allows to directly read and write CUDA host and device memory by the IB adapter, eliminating unnecessary memory copies, hence dramatically lowering CPU overhead, and reducing latency.

CUDA and GPUDirect are registered trademarks of NVIDIA Corporation in the U.S. and/or other countries.