The annual NHR Conference is aiming at promoting scientific exchange among the HPC-user community. Each year the focus will be on different scientific topics.
The NHR Conference ’24 will take place in Darmstadt. During the Scientific Part users of the NHR Centers will have the opportunity to present their projects in a contributed talk or poster session, and to exchange ideas with the consulting and operational teams of the NHR Centers.
Have a look at our contributions to the Conference:
Ludovico Nista
is a research assistant at the Institute for Combustion Technology of the RWTH Aachen University. He has been a research fellow at Von Karman Institute for Fluid Dynamics and visiting scholar at Technion – Israel Institute of Technology, after receiving his Master of Science in Mathematical Engineering and obtaining a Master of Research in Fluid Mechanics at the Von Karman Institute for Fluid Dynamics.
His main research interests are in the fields of machine learning theory, turbulent combustion, and high-order numerical methods with application to applied energy systems.
Since 2020, Ludovico is a member of the SDL Energy Conversion for the National High Performance Computing Center for Computational Engineering Sciences (NHR4CES).
Ludovico Nista’s talk: “Scalability and performance analysis of training and inference-coupled simulations of super-resolution generative adversarial networks for turbulence closure modeling” – September 09, 2024, 02:05 pm to 02:20 pm
Super-resolution (SR) generative adversarial networks (GANs) are promising for turbulence closure modeling in large-eddy simulation (LES) due to their ability to accurately reconstruct high-resolution (HR) data from low-resolution (LR) fields. Current model training and inference strategies are not sufficiently mature for large-scale, distributed calculations due to the computational demands and often unstable training of SR-GANs, which limits the exploration of various model structures, training strategies, and loss-function definitions. Integrating SR-GANs into LES solvers for inference-coupled simulations is also necessary to assess their \emph{a posteriori} accuracy, stability, and cost.
We investigate parallelization strategies for SR-GAN training and inference-coupled LES, focusing on computational performance and reconstruction accuracy. We examine distributed data-parallel training strategies for hybrid CPU–GPU node architectures and the associated influence of low-/high-resolution subbox size, global batch size, and discriminator accuracy. Accurate predictions require training subboxes to encompass at least one integral length scale. Care should be placed on the coupled effect of training batch size, learning rate, number of training subboxes, and discriminator’s learning capabilities.
We introduce a data-parallel SR-GAN training and inference library for heterogeneous architectures that enables on-the-fly data between the LES solver and SR-GAN inference at runtime. We investigate the predictive accuracy and computational performance of this arrangement with particular focus on the overlap (halo) size required for accurate SR reconstruction. Similarly, \emph{a posteriori} parallel scaling is constrained by the SR subdomain size, GPU utilization, and reconstruction accuracy, which limit the computational resources for efficient inference-coupled LES. Based on these findings, we establish guidelines and best practices to optimize resource utilization and parallel acceleration of SR-GAN turbulence model training and inference-coupled LES calculations while maintaining predictive accuracy.
Sandra Wienke
is a PI in our CSG Parallelism and Performance.
Sandra Wienke presents the NHR strategic project “Benchmarks and TCO for NHR Procurements“- September 10, 2024, 2:30 pm to 4:30 pm
The NHR strategic project “Benchmarks and TCO for NHR Procurements” took place from January 2023 to June 2024 and was a joint work of the NHR centers NHR@Göttingen, NHR@TUD, NHR4CES, PC2 and NHR@KIT. This 10-minute talk presented by project leader Sandra Wienke shows a critical assessment of the project.
This project focuses on benchmarking and TCO modelling for HPC procurements and incorporates the experiences of the six project partners. The project presents experiences, best practices and challenges to integrate individual job mixes in HPC procurements. As part of this approach, the extraction of mini-apps from real HPC applications is eased by using the tool MiniApex.
Furthermore, the partners provide a schema definition for an NHR benchmark collection. In addition, benchmark performance and energy consumption are parameters for informed TCO models used for HPC procurements. The project gives insight to the usage of TCO and productivity models at various HPC sites and, thus, fosters their inclusion to new HPC procurements. Here, the TCO component energy is of particular interest, facing different challenges.
Gustavo de Morais
is a researcher in our CSG Parallelism and Performance.
Gustavo de Morais’ talk: “Performance Modeling for CFD Applications” – September 09, 2024, 10:30 am to 10:45 am – “Computational Engineering Session II”
Understanding performance at scale and identifying potential bottlenecks are crucial for developing and optimizing efficient HPC applications. While computation- and communication-intensive kernels/functions are typically well understood, implicit performance bottlenecks, such as those arising from caching or synchronization effects, can be easily overlooked. Performance models facilitate the identification of scalability bottlenecks. However, designing these models analytically for an entire large code base is often impractical due to the manual effort required. Empirical performance modeling tools, such as Extra-P, allow the automatic creation of performance models for CFD applications and other large software suites, although challenges regarding profiling time and model accuracy arise from their size and characteristics. Based on an exemplary OpenFOAM CFD application, this presentation introduces the concept of strong scaling and provides an overview of common challenges and mitigations of empirical performance modeling. Focusing on large software suites like CFD applications, we demonstrate how to generate and interpret empirical performance models in order to identify potential scalability bottlenecks. For this purpose, we employ the Score-P measurement infrastructure to measure the applications’ performance and Extra-P to generate strong-scaling performance models and identify scalability bottlenecks in the code.
Fabian Orland
received his Bachelor and Master degree in Computer Science from RWTH Aachen University. In August 2019 he joined the chair for high-performance computing at the IT Center of RWTH Aachen Uni-versity as a research assistant and PhD student.
From 2019 until 2022 he was a member of the EU Center of Excellence Performance Optimisation and Productivity (POP2) providing performance assessment services for academic and industrial users from many different scientific disciplines.
Since 2021, Fabian is a member of the Cross-Sectional Group Parallelism and Performance at the Natio-nal High Performance Computing Center for Computational Engineering Sciences (NHR4CES).
Fabian Orland’s talk: “Accelerating Deep Learning Inference in Turbulent Reactive Flow Simulations
on Heterogeneous Architectures” – September 10, 2024, 10:00 am to 10:15 am in the track “Simulation & AI”
Data-driven modeling becomes an increasingly important tool to complement traditional numerical simulations across different domain sciences. In large eddy simulations of turbulent reactive flow, for example, deep learning (DL) models have been successfully applied for turbulence closure, as an alternative to current tabulated chemistry closure and to predict sub-filter scale reaction rates.
In all cases the trained DL model needs to be coupled with a highly parallel simulation code to enable a-posteriori evaluation. This coupling constitutes a computational challenge as heterogeneous architectures need to be exploited efficiently. While traditional numerical simulation codes have been highly optimized to run efficiently on CPUs, the inference of a deep learning model can be significantly accelerated using GPUs or other specialized hardware.
In this talk, we present the AIxeleratorService, an open-source software library developed by us to facilitate the deployment of DL models into existing HPC simulation codes on modern heterogeneous computer architectures. Our library provides users with a modular software architecture abstracting from concrete machine learning framework APIs. Moreover, it integrates seemlessly into the MPI parallelization of a given simulation code and hides necessary data communication between CPUs and GPUs to enable acceleration of the DL model inference in a heterogeneous job.
The AIxeleratorService has been sucessfully applied to above use cases and coupled with popular simulation codes like OpenFOAM and CIAO. We will present a selection of results regarding scalability and speedup on GPUs.
Paul Wilhelm
holds a Bachelor and Master degree in Mathematics with a Minor in Physics obtained at the RWTH Aachen. In his Master thesis he researched the extension of the Bogoliubov scalar product to manifolds of non-quadratic matrices and the existence of gradient flows in the induced geometry. After his Master, Paul worked at keenlogics GmbH in Aachen as a soft-ware engineer developing process optimisation software.
Since 2020, Paul is a PhD student at the Institute of Applied and Computational Mathematics (AcoM) at RWTH Aachen and since 2022 also part of the NHR graduate school. His research focusses on developing new methods for the Vlasov-Poisson equation and, in particular, methods avoiding explicit meshing of the phase-space.
Rotislav-Paul Wilhelm’s talk: “Towards using the Numerical Flow Iteration to simulate kinetic plasma physics in the full six-dimensional phase-space” – September 10, 2024, 9:30 am to 9:45 am – “Computational Engineering Session II”
“High-temperature plasmas can be accurately modelled through the Vlasov equation 𝜕𝑡 𝑓 + 𝑣 · ∇𝑥 𝑓 + 𝑞(𝐸 + 𝑣 × 𝐵) · ∇𝑣 𝑓 = 0, (1) where 𝑓 is a time-dependent probability distribution in the six dimensional phase- space. [1]
The Vlasov equation is non-linearly coupled to the Maxwell’s equations to compute the electro-magnetic forces which are induced through the quasi freely moving charged particles in a plasma. Additionally to the high dimensionality of the problem, the non-linear coupling to the Maxwell’s equations introduces turbulences as well as fine structures called filaments. Thus the challenge is to resolve complicated dynamics with fine but physically relevant structures while working in a high-dimensional setting.
Most schemes to solve the Vlasov equation rely on a direct discretization of the phase-space either using particles or a grid-based approach. This comes with the draw-back of extensive memoryusage making these approaches heavily memory-bound. In particular, in the six-dimensional case only low-resolution simulations can be run with significant overhead in terms of communication, leading to sub-optimal scaling results.[2],[3] The Vlasov equation is strongly transport-dominated thus it is possible to use an iterative-in-time approach to discretize the phase flow and evaluate 𝑓 indirectly. This algorithm, the Numerical Flow Iteration (NuFI), essentially shifts complexity from memory-access to computation-on-the-fly.[4] Only the lower-dimensional electromagnetic potentials have to be stored thus the approach has a low memory footprint even in the full six-dimensional case. Additionally it is embarrassingly parallel, as can be demonstrated using the PoP metrics.popmetrics. However, the low memory footprint comes at the cost of the computational complexity being quadratic in the total number of time-steps instead of linear.
Due to the high degree of structure-preservation through NuFI it is an interesting tool for theoretical investigations of complicated phases in kinetic plasma dynamics. In this work we want to investigate its suitability for more complicated, realistic settings keeping the aforementioned performance-relevant aspects in mind.”
Tim Gerrits
holds a Bachelor and Master degree in Computervisualistics from the University of Magdeburg, Germany, where he also received his PhD in Visualization, working on the visualization of second-order tensor data and vector field ensembles.
From 2019 until 2021, he worked as a postdoctoral researcher at the University of Münster, Germany, with a focus on Visual Analytics approaches for ensemble and uncertain data.
Since 2021, Tim leads the Crosssectional Group Visualization at the National High Performance Computing Center for Computational Engineering Sciences (NHR4CES) as well as the Visualization Group at RWTH Aachen University, Germany.
Tim Gerrits’s talk: “DaVE”
DaVE serves as a centralized repository where users can find and discover visualization examples tailored to their specific needs through a simple search. Our database is designed to be user-friendly, offering seamless integration into existing workflows using adaptable containers. Whether you’re exploring cutting-edge visualizations for data or seeking practical solutions to enhance your simulations, DaVE seeks to find helpful resources for you.
Janis Sälker
is researcher in our SDL Materials Design.
Janis Sälker presents a poster: “Computer vision-based analysis of atom probe tomography data”
Atom probe tomography (APT) is a powerful technique to analyze materials at the nanometer-scale, offering 3D spatially-resolved compositional characterization.
Each measurement can capture up to hundreds of millions of atoms, making data interpretation both time-consuming and operator-dependent. To address these challenges, two deep learning-based analysis methods are being explored. The first one involves supervised image segmentation on 2D representation of APT data to investigate phase changes during thermal decomposition of (V,Al)N thin films. The second method employs an unsupervised contrastive-based strategy to group phase regions with less supervision and input from users.
Sandra Wienke
is a PI in our CSG Parallelism and Performance.
Sandra Wienke – togehter with other projects partners – presents the panel “HPC Procurements in NHR – Experiences & Challenges in Benchmarking and TCO Modelling“ – September 12, 11:00 am – 12:00 pm
HPC procurement processes play a major role in making informed decisions on the economic efficiency and suitability of HPC clusters in NHR. The goal of this panel is to foster the exchange and discussion of experiences, best practices, and challenges
on HPC procurements in NHR, and in particular, on benchmarking and Total-Cost-of-Ownership (TCO) modelling as part of requests for proposals (RFPs) and acceptance tests. Furthermore, the panel aims to disseminate this knowledge to further NHR and non-NHR centers, and as such help to ease and enhance current HPC processes.
To this end, the panel targets any staff involved in HPC procurements as participants. The panelists come from the NHR centers – mainly involved in the NHR Strategic Project 2023 “Benchmarks and TCO for NHR Procurements” (whose members also organize this panel). The panel will take place for 60 minutes from which 15 minutes will be used for an overview of the findings of the NHR project “Benchmarks and TCO for NHR Procurements”, 20 minutes for prepared questions, and 25 minutes for questions from the audience and further discussion.
Presenters will be Sandra Wienke (NHR4CES) and Robert Schade (PC2). Robert Schade will also be the moderator of the actual panel. The panelists will be Christian Terboven (NHR4CES), Christian Boehme (NHR@Göttingen), Andreas Wolf (NHR4CES) and Robert Schade (PC2).
Marco Vivenzo
is researcher in our SDL Energy Conversion.
Marco Vivenzo presents a poster: “Development and Assessment of an External GPU-based Library to Accelerate Chemical Kinetics Evaluation of Reactive CPU-based CFD Solvers”
To facilitate the transition to carbon-free energy conversion systems, high-performance computational fluid dynamics (CFD) codes that leverage heterogeneous current Tier-0 cluster architectures are crucial for combustion system redesigns. While the path to exascale performance lies in GPU utilization, many established reactive CFD codes remain CPU-based only. This is because porting an existing code to a GPU-capable programming language is not a straightforward task, as it might require the redesign of numerical algorithms and extensive recoding. Therefore, a drop-in solution for enabling the usage of new GPU-accelerated systems, without significant changes in the original code, involves the execution of the most time-consuming tasks on GPUs employing easily linkable external libraries.
In this regard, the evaluation of chemical source terms emerges as an optimal candidate for GPU porting.
When operator-splitting schemes are used for the solution of the reactive Navier-Stokes equations, the integration of the stiff ODE system containing the source terms associated with chemical kinetics proves to be the most computationally expensive part. Although the potential of GPUs to accelerate reactive CFD simulations is widely acknowledged, a readily usable library for chemical kinetics capable of harnessing the computational power offered by GPUs is currently absent.
The focus of this work is to develop and assess a C++/CUDA-based library, capable of efficiently integrating chemical terms on GPUs. A comprehensive analysis of the performance and the scaling of the proposed approach over multiple computing nodes will be presented, demonstrating how to accelerate reactive CFD simulations through the integration of external GPU-based libraries.
Driss Kaddar
is a research assistant at the department Simulation of reactive Thermo-Fluid Systems of the TU Darmstadt. He received his Master of Science in Chemical Engineering after graduating from the Karlsruhe Institute of Technology.
His main research interests are in the fields of high-performance computing and large-scale simulations of turbulent reactive flows with application to sustainable energy systems.
Since 2021, Driss is a member of the SDL Energy Conversion for the National High Performance Computing Center for Computational Engineering Sciences (NHR4CES).
Driss Kaddar presents a poster: “Ammonia-hydrogen combustion modelling enabled by high-performance GPU computing” (with Hendrik Nicolai, Mathis Bode and Christian Hasse)
Ammonia-hydrogen blends will play a pivotal role for future carbon-free combustion systems. To minimize remaining emissions in ammonia combustion, staged-combustion systems, such as rich-quench-lean technologies, are proposed. However, the combustion behavior of turbulent rich ammonia-hydrogen mixtures lack comprehensive understanding. In particular, the quantification of complex phenomena like partial cracking, hydrogen slip, and post-flame stratification and their interaction with flame structures and pollutant formation remains insufficient.
This is a major scientific barrier hindering the realization of NH3/H2 blends for carbon-free combustion. Recent HPC advancements, particularly in GPU-based systems, enable combustion DNS beyond academic configurations. Utilizing nekCRF, a new GPU-based spectral element solver based on nekRS, we perform finite-rate chemistry DNS of a rich, turbulent premixed jet flame configuration at atmospheric pressure.
This unique data set provides fundamental insights into the intricate interaction of reactions and turbulence that are crucial for developing future models. The analysis focuses on NH3/H2 interaction, revealing residual H2, minimized NH3 slip, and enhanced heat release through turbulent mixing.
We demonstrate the scalability of the spectral element solver on European pre-exascale HPC systems and showcase the implications of a highly scalable GPU-code on the design of sustainable energy solutions.