Systems: Take a look at Lichtenberg II and CLAIX-2018

A cooperation of TU Darmstadt
and RWTH Aachen University

Lichtenberg II at
TU Darmstadt.

Systems ―

Lichtenberg II at TU Darmstadt

With over 56,000 cores in total, over 257 TB main memory, the first phase of TU Darmstadt‘s system Lichtenberg II has a performance peak of over 3.148 PFLOPS. In June 2020, the cluster was placed at #92 on the list of the world’s fastest HPC systems (TOP500), making it the fastest university cluster in Germany at that time.

Moreover, on the Green500 list of the most energy-efficient systems, it was ranked to be the most energy-efficient German university cluster at #48. The operational concept utilizes direct hot-water cooling and the reuse of waste heat during the heating period.

See: EnEff:Stadt Campus Lichtwiese.

Lichtenberg II has an MPI section with dual-socket nodes, an MEM section with big-memory nodes in excess of 1 TB of memory, an ACC section with powerful accelerator nodes hosting at least two accelerators, and a test section. An additional ACC section containing Nvidia „DGX A100“ nodes to support modern research in Artificial Intelligence is currently in preparation. The whole cluster is equipped with measurement equipment for improving energy efficiency of HPC applications.

The storage and compute systems are procured separately to ensure the best possible continuous storage service. As a first step, a Spectrum Scale system (formerly known as GPFS) with 3 PB of space was procured in 2018.

Phase P1 of Lichtenberg II
The current first compute phase P1 of Lichtenberg II consists of 586 compute nodes with two Intel Cascade-Lake AP (96 cores SMP) processors and 384 GB main memory each, two big-memory nodes equipped with 1.5 TB main memory each, and 8 accelerator nodes with 2x Nvidia „V100“ GPUs each. All nodes and the file systems are interconnected via EDR and HDR100 Infiniband.

The Lichtenberg installations reflect the multi-use concept of TU Darmstadt, which aims to procure each phase as homogeneously as possible, e.g. with comparable CPU, memory and network architectures, also for special purpose nodes. The goal is to reduce the hardware peculiarities a user has to keep track of, thus bringing down the number of mistakes at the batch level.

Special purpose sections can be used for standard workloads when momentarily underutilized. Beyond convenience and flexibility, this principle provides economic savings with respect to rack space, reduced idle time, and system administration.

Aachen University.

Systems ―

CLAIX at RWTH Aachen University

The high-performance computing system CLAIX – Cluster Aix-la-Chapelle – at RWTH Aachen University is operated by the IT Center and currently consists of three parts: the Tier-2 part from the procurement phases 2016 and 2018 and the Tier-3 part from the procurement phase 2018.

The Tier-3 part consists of over 220 compute nodes with identical configuration (6 of those with GPUs) and are fully integrated into the overall cluster. CLAIX-2018 started test operation in November 2018 and since January 2019 the system has been available without restriction for use by computing time projects.