NHR4CES Community Workshop:

Performance Engineering for Numerical Methods in Computational Fluid Dynamics

Computational Fluid Dynamics (CFD) simulations are a crucial yet costly driving force behind many scientific computing, research, and industrial design endeavors. As such, they are responsible for the consumption of a large portion of available computing time in High-Performance Computing (HPC) systems and a worthwhile target for performance optimizations and studies.

Short abstract:

Numerical experiments via CFD applications enable, for example, research towards low emissions in combustion engines or green fuels. Due to their relevance for and ever-changing requirements by research institutions and industry alike, they are constantly subjected to new developments and optimizations targeted towards the simulation of new or increasingly large problems with high accuracy. As a result of the complexity of the software, the amount of required computational resources, and the complexity of modern HPC systems used for the simulations, the importance and benefits of applying performance engineering techniques are evident.

 

Date: June 13 (1.00 pm – 5.00 pm) and June 14 (1.00 pm – 5.00 pm), 2024

Format: Online

 

This workshop is designed to highlight recent activities, developments, and new concepts for analyses and improvements centered around numerical methods used in CFD applications. Emphasis will be placed in particular on optimizations, applicable performance analyses and engineering techniques, as well as the presentation of new and innovative computational methods for specific problems. The workshop aims to connect research groups from different domains with the goal of increasing the performance, efficiency, and capabilities of modern CFD applications by sparking discussions, potential collaborations, and an active exchange of ideas.

 

Language: English

Capacity: 300

Registration

alt

All about our speakers

alt

Dr. Marta Garcia Gasulla

is a researcher at the Computer Science department of the Barcelona Supercomputing Center (BSC) since 2006. At BSC she leads the Best Practices for Performance and Programmability (BePPP). BePPP aims to bridge the gap between scientific domain researchers and computer scientists researchers while promoting best practices and gathering co-design insight through performance analysis. She obtained her PhD in Computer Architecture from Universitat Politecnica de Catalunya in 2017. Her research topics are load balancing, application performance, hybrid parallel programming, and parallel programming models. She has been involved in several European projects (HBP, Mont-blanc3, POP2, POP3, exaFOAM, EuPILOT, DEEP-SEA or Plasma-PEPSC) and collaboration projects with companies (e.g., Intel, IBM, or Huawei). She was an associate professor at the Universitat Politecnica de Catalunya (UPC) between 2008 and 2013, lecturing on Operating Systems and Parallel Programming courses.

Dr. Marta Garcia Gasulla will give the talk: Lessons learned from a performance analysis of an OpenFOAM HPC Grand Challenge

Computational Fluid Dynamics (CFD) simulations are among the most resource-consuming HPC applications. OpenFOAM is one of the most widely known and used CFD solver. In this talk, we will present a performance analysis of an HPC Grand Challenge by OpenFOAM. In particular, we analyze a DLR-confined jet high-pressure compressor (DLR CJH) case running up to 6 thousand cores.



We will show how, by combining different tools with different levels of detail, we are able to analyze a large-scale simulation. Based on this analysis, we spot some inefficiencies that the developers addressed, resulting in a speedup of more than 10x when running with 32K cores.

alt

Gustavo de Morais

is a research associate within the Parallel Programming Group at the Technical University of Darmstadt. Currently pursuing his Ph.D., he focuses on advancing performance modeling in parallel programs. His research interests span instrumentation and measurement methods, performance modeling techniques, and strategies for mitigating noise effects in modeling.

 

Alexander Geiß

works as a research associate in the Parallel Programming Group at the Technical University of Darmstadt. He is currently undertaking a Ph.D., concentrating on enhancing the performance modeling of heterogeneous applications. His research covers various aspects including profiling heterogeneous applications, performance modeling, noise reduction, and cross-platform application performance prediction. Notably, he led the tools work package in the recently completed DEEP-SEA project, which focused on developing a software stack tailored for the upcoming European exascale systems.



Gustavo de Morais and Alexander Geiß will give the talk: Performance Modeling for CFD Applications

Due to the complex numerical simulations associated with CFD, using high-performance computing (HPC) systems is often necessary. In this context, understanding the intricate scaling behavior of these applications is crucial for developers seeking to optimize their performance in HPC. This involves comprehending how the application behaves as the number of processes or inputs increases and identifying potential bottlenecks.

In our talk, we aim to demonstrate techniques for instrumenting, measuring, and understanding the performance of HPC applications by example of CFD study cases using OpenFOAM. We will delve into interpreting the scaling behavior, encompassing both computation and communication aspects, using our performance modeling tool, Extra-P.

 

alt

Paul Wilhelm

studied Math at RWTH Aachen University (finished 2019). Since 2020 he ist a PhD student at ACoM with Manuel Torrilhon. He is a fellow of the NHR Graduate School and works on grid-free, structure-preserving numerical methods for kinetic plasma physics.

 

Fabian Orland

received his Bachelor’s and Master’s degrees in Computer Science from RWTH Aachen University. In August 2019 he joined the chair for high-performance computing at the IT Center of RWTH Aachen University as a research assistant and PhD student. From 2019 until 2022 he was a member of the EU Center of Excellence Performance Optimisation and Productivity (POP2) providing performance assessment services for academic and industrial users from many different scientific disciplines. Since 2021, Fabian has been a member of the Cross-Sectional Group Parallelism and Performance at the National High-Performance Computing Center for Computational Engineering Sciences (NHR4CES). Since 2022, he has also been a member of the EU Center of Excellence RAISE. 



Paul Wilhelm and Fabian Orland will give the talk: Assessing the performance of solvers for kinetic plasma dynamics in a six-dimensional phase-space

Plasmas are an ionized state of matter, which occur not only in stars or galactic formations but are also highly relevant for modern high-tech applications such as nuclear fusion reactors or microchip production. In particular, for cases where the involved plasmas are in a high-temperature state, as is the case for e.g. fusion reactions, they exhibit kinetic effects that can no longer be captured by fluid dynamical models. Therefore one has to resort to the Vlasov equation arising from kinetic theory. 

While understanding kinetic effects is crucial, solving the high-dimensional and turbulent Vlasov equation puts classical numerical approaches before significant challenges. Without simplifications, one has to solve a time-dependent, six-dimensional partial differential equation (PDE), where a resolution allowing to capture the dynamics accurately for a long time is infeasible even on modern high-performance computer hardware. Common discretization approaches for the Vlasov equation are the so-called semi-Lagrangian schemes, which combine the grid-based Eulerian and particle-based Lagrangian perspectives of convection-dominated PDEs to improve the stability and accuracy of the respective methods. 

 

 



One such semi-Lagrangian approach able to handle the full six-dimensional case to some extent is based on a discontinuous Galerkin approach (SLDG).

Here we present a novel approach to solving the Vlasov–Poisson equation: the Numerical Flow Iteration (NuFI). To evaluate the numerical solution it stores the low-dimensional  electric potentials and then employs these to iteratively reconstruct the characteristics backward in time. This reduces the total memory footprint by several orders of magnitude as complexity gets shifted from memory access to computation on the fly. Furthermore, this ansatz yields strong conservation properties due to the exploitation of the solution structure.

The Center of Excellence Performance Optimisation and Productivity established a standardized methodology to assess the performance of parallel applications based on hierarchical efficiency metrics. We will briefly introduce the hierarchical model and demonstrate how we applied it to assess the performance of NuFI compared to the SLDG approach.

 

 

alt

Prof. Ricardo Vinuesa

is an associate Professor at KTH Royal Institute of Technology, Stockholm, Sweden. He is the principal investigator of the Vinuesa Lab.

Prof. Ricardo Vinuesa will give the talk: Explaining and controlling turbulent flows through deep learning

In this presentation, we first use a framework for deep-learning explainability to identify the most important Reynolds-stress (Q) events in a turbulent channel (simulated via DNS) and a turbulent boundary layer (obtained experimentally). This objective way to assess importance reveals that the most important Q events are not the ones with the highest Reynolds shear stress. This framework is also used to identify completely new coherent structures, and we find that the most important coherent regions in the flow only have an overlap of 70% with the classical Q events. 

In the second part of the presentation, we use deep reinforcement learning (DRL) to discover completely new strategies for active flow control. We show that DRL applied to a blowing-and-suction scheme significantly outperforms the classical opposition control in a turbulent channel: while the former yields a 30% drag reduction, the latter only 20%. We conclude that DRL has tremendous potential for drag reduction in a wide range of complex turbulent-flow configurations.

alt

Dr. Joachim Jenke


is member of NHR4CES’s CSG Parallelism and Performance.

Dr. Joachim Jenke will give the talk: Performance and Correctness Analysis of an Exascale Application

This presentation highlights correctness and performance analysis results for an open-source neuro-science application.  Code developers reported unexpected crashes and hangs of the application. Data race analysis with ThreadSanitizer available in GNU and LLVM compilers in combination 

with our extension for OpenMP-aware data race analysis identified possible causes for the application behavior. We will look into the analysis setup and the reports from the tool. We furthermore will discuss insights about the performance behavior of the code.

alt

Dr. Temistocle Grenga

is a Lecturer in Computational Fluid Dynamics for Aerospace at the University of Southampton since February 2023. He is an expert in numerical simulation of turbulent reacting flows and data-driven modeling. He received an M.Sc. in Aeronautical Engineering at the Sapienza University of Rome (Italy) in 2009, an M.Sc. in Mechanical Engineering at the University of Notre Dame (USA) in 2013, and a Ph.D. in Aerospace and Mechanical Engineering at the same University in 2015. He was a Postdoctoral Research Associate at Princeton University (USA) from September 2015 to August 2018, and at RWTH Aachen (Germany) from September 2018 to January 2023, where he was also Leader of the Multiphase Group and the HPC Group. He published more than 75 papers in international journals and conference proceedings. He supervised 6 PhD students in his former position at RWTH Aachen (Germany) investigating Machine Learning modeling of turbulent flows, conservative numerical method for interface tracking, HPC GPU-based library for chemistry in CFD, and reduced order modeling of multiphase flows. He was invited as a lecturer on Machine Learning applications for fluid dynamics and combustion in several European schools for Ph.D. students.



Dr. Temistocle Grenga will give the talk: Efficient use of computational resources: The Wavelet Adaptive Multi-resolution Method

The intricate coupling between the fluid mechanics, heat transfer, and chemistry results is particularly difficult to simulate accurately. Diffusive effects, and chemical reactions, occur on molecular scales, and the resolution required to capture viscous effects would be one-tenth of a micron. Thus, spatial scales span seven or more orders of magnitude. Similarly, time scales can range from those associated with chemical reactions (nano-seconds) up to macro time scales of the order of seconds or more.

These multiscale problems are generally impractical to solve (in terms of computer time and memory required) on a fixed computational grid. However, in many problems of practical interest, small scales only occur in limited regions of the computational domain and possibly at certain times. 

Wavelet methods are particularly well-suited for adaptively solving partial differential equations (PDEs). Wavelets are mathematical functions with compact support in both location and scale. Their amplitudes indicate the local regularity of a function: it is large in regions where a function changes sharply, and small where it is smooth.



Dynamically adaptive algorithms for initial value problems use the sparse set of collocation points at a previous time step to generate an updated set of points for the next time step, effectively tracking fine-resolution features as they develop in the domain. Wavelets provide a methodology for controlling grid adaptivity without requiring ad hoc error estimates or heuristics.

The Parallel Wavelet Adaptive Multi-resolution Method (WAMR) method has been implemented in Fortran 90 and uses the MPI standard for parallelization and a domain decomposition approach. It has been verified using several test problems in 1-, 2-, and 3-dimensions, including the classic Sod shock tube problem and the Taylor-Sedov blast wave.

The main feature of WAMR is the capability to reach the same accuracy as other methods using a grid 4 orders of magnitude smaller. The WAMR method has been applied to three compressible flow problems. Among the others, the evolution of a Richtmyer-Meshkov instability, and the evolution of a hydrogen bubble struck by a shock in air that has been performed in several configurations.

alt

Dr. Daniel Mira

is the Head of the Propulsion Technologies Group at the Computing Applications for Science and Engineering (CASE) Department of the Barcelona Supercomputing Center (BSC). Dr Mira received his Bachelor’s Degree in mechanical engineering from the Universitat Politècnica de Valencia (Spain) in 2008 and his PhD in mechanical engineering from Lancaster University in 2012. His research is focused on the development of advanced simulation methods to investigate the combustion characteristics of propulsion and power systems. The main activities include physical modeling and numerical methods using High-Performance Computing (HPC) and data-driven approaches.

Dr. Daniel Mira will give the talk: Porting CPU-based/optimized combustion codes to CPU-GPU heterogeneous architectures

The increase in computational power over the past decade, coupled with the upcoming Exascale supercomputers, heralds a new era in computational modeling and simulation within combustion science. Given the complex multiscale and multiphysics nature of turbulent reacting flows, combustion simulations stand out as among the most demanding applications for state-of-the-art supercomputers. Exascale computing promises to push the boundaries of combustion system simulation, allowing for more realistic conditions 

through high-fidelity methods. However, to effectively leverage these computing architectures, it is essential to employ methodologies that can exploit parallelism at all levels. This presentation covers practical aspects to be considered when porting CPU-based codes to accelerators built from the experience of the development of the multiphysics code Alya for combustion applications.

alt

Dr. Harald Klimach

graduated in aerospace engineering at the University of Stuttgart in 2005 and worked afterward at the High-Performance Computing Centre Stuttgart (HLRS) in the European DEISA and PRACE projects on user code porting and optimization, the PRACE benchmark suite and a joint research activity on coupled fluid-fluid simulations. In 2010 he assumed a research position at chair of Advanced Simulations in Engineering at the German Research School in Simulation Sciences (Prof. Sabine Roller), a collaboration between the RWTH Aachen and FZ Jülich. He started the development of the APES framework with the group there as a common basis for the various activities concerning fluid simulations at the chair.

Since 2013 he worked as a researcher at the University Siegen, continuing the development of APES and teaching programming in Fortran and parallelization with MPI. He defended his PhD thesis “Parallel Multi-Scale-Simulations with Octrees and Coupled Applications” at the RWTH Aachen in 2016 and is working since 2021 at the DLR Institute of Software Methods for Product Virtualization in Dresden. Over the last decade, he also served as a lecturer and organizer in the introductory CFD course in the HLRS course program.



Dr. Harald Klimach will give the talk: Performance Assessments of the Lattice Boltzmann Solver Musubi

Presentation of the core concepts of Musubi in its deployment in high-performance computing environments for detailed flow simulations and elaboration on its performance analysis over time on various supercomputing systems.

Musubi is an open-source Lattice-Boltzmann (LBM) solver under development since 2011 within the APES framework using an octree mesh representation. It is implemented in modern Fortran and parallelized with the help of MPI and OpenMP. I briefly introduce some computational aspects of LBM and why it is attractive for some CFD tasks, detail the implementation ideas of Musubi, and explain the wider architecture in APES before diving into performance analyses conducted for the code. Performance assessments have been performed for the solver repeatedly over time by different means and methods, including some analysis by HLRS within the POP Cluster of Excellence project. 

 

I recount how those performance analyses shed light on the various aspects of the implementation and helped improve the execution of large-scale computing systems. The deployment of the code spreads across a range of architectures spanning from IBMs BlueGene to NECs SX vector computers.

 At the heart of those performance assessments, there always is the evaluation of the serial and node-level performance, which establishes a baseline for the parallel execution. An important observation in the overall consideration is that the performance depends on many factors acting together and depending on the features used in the code may vary widely. I address these points and elaborate on common themes and apparent differences.

 

alt

Prof. Christian Hasse

is principal investigator of our SDL Energy Conversion and professor at the Technical University of Darmstadt. Since 2017 he ist the head of the Institute for the Simulation of Reactive Thermo-Fluid Systems with currently 30 PhD students and postdocs. He received his diploma in mechanical engineering in 1998 and his PhD in 2004 (supervisor: Norbert Peters), both at RWTH Aachen University. After working in engine development at BMW in Munich for 5.5 years, he returned to academia in 2010. From 2010-2017, he was Professor of Numerical Thermofluid Dynamics at the Technical University of Freiberg before he moved to his current position in Darmstadt.

He has published more than 240 scientific papers in peer-reviewed journals and is also a reviewer for more than 20 scientific journals and several national and international funding agencies.

Prof. Christian Hasse will host our panel discussion.

Contact

Fabian Orland

RWTH Aachen University

Felipe González

RWTH Aachen University

Fabian Czappa

TU Darmstadt

Marco Vivenzo

RWTH Aachen University

Thomas Hösgen

RWTH Aachen University

Lukas Rothenberger

TU Darmstadt

Xiaoyu Wang

TU Darmstadt