The newly formed CSG Data Engineering & AI was created from the merger of the CSG Data Science and ML and the CSG Data Management at the beginning of 2026. They combine their strengths and streamline efforts—because integrating complementary expertise enables more powerful, coherent, and scalable support for data-driven research.
The CSG Data Engineering & AI brings together expertise in research data management, automated data engineering, data science and machine learning to support seamless, end-to-end research workflows on HPC systems. Its goal is to advance scalable, reproducible, and efficient data pipelines and analytic processes. By uniting these areas of expertise, the group develops methods, tools, and workflows that accelerate research, reduce friction, and deliver insights for the broader scientific community.
Within this framework, the PIs (Principal Investigators) contribute complementary expertise that collectively drives the group’s impact:
- Prof. Dr. Matthias Müller (IT Center, RWTH Aachen) provides deep expertise in research data management for HPC, including methods for metadata handling and efficient data-lifecycle processes.
- Prof. Dr. Carsten Binnig (Data and AI Systems Lab, TU Darmstadt) contributes AI-based approaches to automate data engineering tasks, including data cleaning, transformation, and metadata annotation.
- Prof. Dr. Kristian Kersting (Artificial Intelligence & Machine Learning Lab, TU Darmstadt) brings expertise in probabilistic programming and machine learning automation.
- Prof. Dr. Wil van der Aalst (Process and Data Science Group, RWTH Aachen) contributes process mining methods to analyze and optimize computational workflows, improving HPC efficiency and resource utilization.
- Prof. Dr. Bastian Leibe (Computer Vision Group, RWTH Aachen) adds expertise in deep learning and computer vision, focusing on techniques for object recognition and tracking.
Together, these research areas enable the CSG Data Engineering & AI to deliver integrated methods, tools, and workflows that push forward data-driven research across the community—advancing end-to-end research from raw data to actionable insights on HPC systems.