CSG Data Science and Machine Learning

A cooperation of TU Darmstadt
and RWTH Aachen University

Cross-Sectional Group

The CSG Data Science and Machine Learning provides training (e.g., videos, tutorials, code, etc.) in machine learning and process mining techniques.

Our goals are to guide users through machine learning and provide infrastructure to make machine learning on HPC easily accessible while also increasing the scalability of the process mining techniques.

We can help researchers analyze data to identify patterns or make predictions. We offer support with any data containing a case identifier, an activity, and a timestamp within process mining. In addition, we offer support for the implementation and analysis of scientific workflows.

In the integration of process mining with HPC, we have two important parts one PM4SW, which is analyzing large scientific workflows (Simulation, ML, AI, PM, etc.) executed on the HPC cluster. Analyzing performance, bottlenecks and improving scheduling and planning. Workflow systems currently investigated are Camunda, Knime, RapidMiner, Integromat.

The second part is SW4PM, supporting process mining workflows for scientific experiments to facilitate the use and improve performance. The tools investigated for this part are ProM, Celonis, PM4Py. In addition, looking at distributing process mining algorithms, using GPUs, etc.

We also develop new automated machine learning (AutoML) solutions and provide them to users. We support applications in identifying patterns, statistics/causal dependencies identification, process discovery, and conformance checking.

As prediction techniques, we utilize (classical) Machine Learning, Deep Learning, Probabilistic Graphical Modelling, Process Mining, Data Visualization, and Reinforcement Learning. In addition, we work with Tensorflow & PyTorch, NumPy, SciPy, SimPy, Dataframes (Pandas, Spark), Scikit-Learn, and Dask.

Contact  the CSG Data Science and Machine Learning here!


If you have questions for other groups or general questions like access to the HPC infrastructure, have a look at our support website.

Current research topics:

  • Data Science Workflows in HPC
  • AutoML
  • Causal Discovery
  • SW4PM: scientific workflows for process mining and other analysis techniques
  • PM4SW: process mining for scientific workflows

Support activities:

  • Learning material for Process Mining & Machine Learning
  • Learning material to provide an understanding of the type of data needed by these techniques
  • Support in overcoming data size/quality/privacy issues (in cooperation with CSG Data Management)
  • Assistance in specific infrastructural challenges
  • Support for scientists with Machine Learning or Process Mining problems applied to their specific research field

Planned teaching activities:

  • Videos for specific aspects (data parallelization in process mining, Python libraries for data science such as Tensorflow, Spark, Dask) of data science/machine learning in HPC
  • Process Mining Summer School 2022 (Aachen)
  • Hands-on workshop on ML@HPC

Training offers 2022:


Prof. Dr. Kristian Kersting

TU Darmstadt

Prof. Dr. Bastian Leibe

RWTH Aachen University

Gyunam Park

RWTH Aachen University

Zahra Sadeghibogar

RWTH Aachen University

Jonas Seng

TU Darmstadt

Prof. Dr. Wil van der Aalst

RWTH Aachen University