Systems: Take a look at Lichtenberg II and CLAIX-2023

A cooperation of TU Darmstadt
and RWTH Aachen University

Lichtenberg II at
TU Darmstadt.

Systems ―

Lichtenberg II at TU Darmstadt

Equipped with the latest technology, Lichtenberg II high-performance computer at TU Darmstadt sets performance and energy efficiency standards and thus offers the best conditions for excellent research. The computer is named after the polymath Georg Christoph Lichtenberg (1742-1799).

Sustainable computing for top-level research

The design of sustainable materials, the management of the energy transition or the security of cyberspace are just a few examples of applications that require data-intensive calculations. The performance of Lichtenberg II enables calculations that could not be performed at all or much more slowly on ordinary computers. The scientific topics are as diverse as the applications and simulation programs that researchers need to tackle them. The flexible architecture of the Lichtenberg II high-performance computer enables adaptable solutions tailored to the needs of the scientists.

The Lichtenberg II system

The first expansion stage of the Lichtenberg II system with 643 computing nodes was put into operation in mid-2020; this was expanded by 581 computing nodes in 2023 with the second expansion stage. Together, the two expansion stages provide their users with a theoretical peak performance of approximately 8.5 petaflops (PFlops) per second through processors and 1.7 petaflops per second through accelerators. The main memory totals 563 terabytes, the storage system for data around 6 petabytes.

The Lichtenberg supercomputer, which is operated by the University Computing Center (HRZ) of TU Darmstadt, is one of the fastest university computers in Germany and is represented twice in the current ranking of the Top500 supercomputers worldwide. With its first expansion stage, it is ranked 216th, and with its new (second) expansion stage, it is the winner among the participating Sapphire Rapid systems (4th generation Intel Xeon Scalable processors) and is thus ranked 230th on the list.

The investment costs for the Lichtenberg high-performance computer of 15 million euros are borne in equal parts by the federal government and the state of Hesse.

Highly efficient energy management concept

Lichtenberg II is the technical heart of a sustainable overall concept consisting of reliable operation, focused consulting, and cutting-edge research that translates methodological innovations in the form of first-principle models and data-based simulation programs into new insights.

In addition, the system is an important aggregate in the highly efficient energy management concept on the Lichtwiese campus of TU Darmstadt. The waste heat from Lichtenberg II is not simply released into the environment, but during the heating period, a significant portion is fed into the district heating network that connects all buildings on the Lichtwiese campus.

For this purpose, Lichtenberg II uses direct and highly efficient hot water cooling to utilize the power of the processors fully. In the process, special heat exchangers and coolant distributors enable high return temperatures of more than 45 degrees Celsius to ensure sensible reuse of energy and efficient cooling. This leads to a significantly improved CO2 and energy balance and is an important step towards sustainable high-performance computing.

CLAIX at RWTH
Aachen University.

Systems ―

CLAIX at RWTH Aachen University

The high-performance computing system CLAIX – Cluster Aix-la-Chapelle – at RWTH Aachen University is operated by the IT Center. Since this year, CLAIX-2023 is in operation and offers a CLAIXperience on a new level!

The new CLAIX-2023 offers powerful Intel Xeon 8468 Sapphire Rapids CPUs with a total of 96 cores per computing node for a significant increase in performance. It has 52 special servers for artificial intelligence and machine learning applications, each with four NVIDIA H100 GPUs, which enables an impressive total performance of over 14 PFLOPS in the ML segment alone.

632 directly water-cooled computing nodes for High Performance Computing (HPC) feature enhanced performance, sustainability, and energy efficiency. The two Intel Xeon 8468 Sapphire Rapids CPUs have a total of 96 cores in each computing node. Compared to the previous system, the performance of many applications with a similar configuration is increased by a factor of around two.

The nodes offer memory options of 256, 512, or 1024 GB RAM for precise usage and cost optimization. The HPC segment’s peak performance is about 4 PFLOPs and up to 530 million core hours will be allocated each year.

To address current trends in artificial intelligence, 52 servers were acquired for machine learning applications.

In addition to the two CPUs, these computing nodes are each equipped with four very powerful and closely coupled NVIDIA H100 GPUs. Together with the 96 GB HBM2e memory per GPU, even very large ML-based models can be calculated, as the high-speed network in this segment is even more powerful. The total performance of the ML segment in relation to the GPUs is therefore over 14 PFLOPS.

To simulate large models, the highly scalable applications generally use a large number of computing nodes in parallel. To ensure that communication between the computing nodes does not become a bottleneck, the entire system was equipped with a very fast NDR Infiniband RDMA (Remote Data Memory Access) network. The system is rounded off with new login nodes and a special interactive partition that enables users to start interactive jobs via a JupyterHub without long waiting times. This modern access option makes it easier to enter the world of high-performance computing, especially for the many students and new employees at RWTH. A new high-performance parallel file system (Lustre) with a total capacity of 26 PiB will also be available for storing and processing research data.

alt