Systems: Take a look at Lichtenberg II and CLAIX-2023

A cooperation of TU Darmstadt
and RWTH Aachen University

Lichtenberg II at
TU Darmstadt.

Systems ―

Lichtenberg II at TU Darmstadt

With over 56,000 cores in total, over 257 TB main memory, the first phase of TU Darmstadt‘s system Lichtenberg II has a performance peak of over 3.148 PFLOPS. In June 2020, the cluster was placed at #92 on the list of the world’s fastest HPC systems (TOP500), making it the fastest university cluster in Germany at that time.

Moreover, on the Green500 list of the most energy-efficient systems, it was ranked to be the most energy-efficient German university cluster at #48. The operational concept utilizes direct hot-water cooling and the reuse of waste heat during the heating period.

See: EnEff:Stadt Campus Lichtwiese.

Lichtenberg II has an MPI section with dual-socket nodes, an MEM section with big-memory nodes in excess of 1 TB of memory, an ACC section with powerful accelerator nodes hosting at least two accelerators, and a test section. An additional ACC section containing Nvidia „DGX A100“ nodes to support modern research in Artificial Intelligence is currently in preparation. The whole cluster is equipped with measurement equipment for improving energy efficiency of HPC applications.

The storage and compute systems are procured separately to ensure the best possible continuous storage service. As a first step, a Spectrum Scale system (formerly known as GPFS) with 3 PB of space was procured in 2018.

Phase P1 of Lichtenberg II
The current first compute phase P1 of Lichtenberg II consists of 586 compute nodes with two Intel Cascade-Lake AP (96 cores SMP) processors and 384 GB main memory each, two big-memory nodes equipped with 1.5 TB main memory each, and 8 accelerator nodes with 2x Nvidia „V100“ GPUs each. All nodes and the file systems are interconnected via EDR and HDR100 Infiniband.

The Lichtenberg installations reflect the multi-use concept of TU Darmstadt, which aims to procure each phase as homogeneously as possible, e.g. with comparable CPU, memory and network architectures, also for special purpose nodes. The goal is to reduce the hardware peculiarities a user has to keep track of, thus bringing down the number of mistakes at the batch level.

Special purpose sections can be used for standard workloads when momentarily underutilized. Beyond convenience and flexibility, this principle provides economic savings with respect to rack space, reduced idle time, and system administration.

Aachen University.

Systems ―

CLAIX at RWTH Aachen University

The high-performance computing system CLAIX – Cluster Aix-la-Chapelle – at RWTH Aachen University is operated by the IT Center. Since this year, CLAIX-2023 is in operation and offers a CLAIXperience on a new level!

The new CLAIX-2023 offers powerful Intel Xeon 8468 Sapphire Rapids CPUs with a total of 96 cores per computing node for a significant increase in performance. It has 52 special servers for artificial intelligence and machine learning applications, each with four NVIDIA H100 GPUs, which enables an impressive total performance of over 14 PFLOPS in the ML segment alone.

632 directly water-cooled computing nodes for High Performance Computing (HPC) feature enhanced performance, sustainability, and energy efficiency. The two Intel Xeon 8468 Sapphire Rapids CPUs have a total of 96 cores in each computing node. Compared to the previous system, the performance of many applications with a similar configuration is increased by a factor of around two.

The nodes offer memory options of 256, 512, or 1024 GB RAM for precise usage and cost optimization. The HPC segment’s peak performance is about 4 PFLOPs and up to 530 million core hours will be allocated each year.

To address current trends in artificial intelligence, 52 servers were acquired for machine learning applications.

In addition to the two CPUs, these computing nodes are each equipped with four very powerful and closely coupled NVIDIA H100 GPUs. Together with the 96 GB HBM2e memory per GPU, even very large ML-based models can be calculated, as the high-speed network in this segment is even more powerful. The total performance of the ML segment in relation to the GPUs is therefore over 14 PFLOPS.

To simulate large models, the highly scalable applications generally use a large number of computing nodes in parallel. To ensure that communication between the computing nodes does not become a bottleneck, the entire system was equipped with a very fast NDR Infiniband RDMA (Remote Data Memory Access) network. The system is rounded off with new login nodes and a special interactive partition that enables users to start interactive jobs via a JupyterHub without long waiting times. This modern access option makes it easier to enter the world of high-performance computing, especially for the many students and new employees at RWTH. A new high-performance parallel file system (Lustre) with a total capacity of 26 PiB will also be available for storing and processing research data.