CSG Data Science and Machine Learning

A cooperation of TU Darmstadt
and RWTH Aachen University

Cross-Sectional Group

The CSG Data Science and Machine Learning provides training (e.g., videos, tutorials, code, etc.) in machine learning and process mining techniques.

Our goals are to guide users through machine learning and provide infrastructure to make machine learning on HPC easily accessible while also increasing the scalability of the process mining techniques.

We can help researchers analyze data to identify patterns or make predictions. We offer support with any data containing a case identifier, an activity, and a timestamp within process mining. In addition, we offer support for the implementation and analysis of scientific workflows.

In the integration of process mining with HPC, we have two important parts one PM4SW, which is analyzing large scientific workflows (Simulation, ML, AI, PM, etc.) executed on the HPC cluster. Analyzing performance, bottlenecks and improving scheduling and planning. Workflow systems currently investigated are Camunda, Knime, RapidMiner, Integromat.

The second part is SW4PM, supporting process mining workflows for scientific experiments to facilitate the use and improve performance. The tools investigated for this part are ProM, Celonis, PM4Py. In addition, looking at distributing process mining algorithms, using GPUs, etc.

We also develop new automated machine learning (AutoML) solutions and provide them to users. We support applications in identifying patterns, statistics/causal dependencies identification, process discovery, and conformance checking.

As prediction techniques, we utilize (classical) Machine Learning, Deep Learning, Probabilistic Graphical Modelling, Process Mining, Data Visualization, and Reinforcement Learning. In addition, we work with Tensorflow & PyTorch, NumPy, SciPy, SimPy, Dataframes (Pandas, Spark), Scikit-Learn, and Dask.

If you have questions for other groups or general questions like access to the HPC infrastructure, have a look at our support website.

Current research topics:

  • Data Science Workflows in HPC
  • AutoML
  • Federated Learning
  • Process Mining in different modalities
  • SW4PM: scientific workflows for process mining and other analysis techniques
  • PM4SW: process mining for scientific workflows

Support activities:

  • Learning material for Process Mining & Machine Learning
  • Learning material to provide an understanding of the type of data needed by these techniques
  • Support in overcoming data size/quality/privacy issues (in cooperation with CSG Data Management)
  • Assistance in specific infrastructural challenges
  • Support for scientists with Machine Learning or Process Mining problems applied to their specific research field

Planned teaching activities:

  • Videos for specific aspects (data parallelization in process mining, Python libraries for data science such as Tensorflow, Spark, Dask) of data science/machine learning in HPC
  • Process Mining Summer School 2022 (Aachen)
  • Hands-on workshop on ML@HPC

Training offers 2024:

Members

Prof. Dr. Kristian Kersting

TU Darmstadt

Prof. Dr. Bastian Leibe

RWTH Aachen University

Viktor Pfanschilling

TU Darmstadt

Zahra Sadeghibogar

RWTH Aachen University

Jonas Seng

TU Darmstadt

Prof. Dr. Wil van der Aalst

RWTH Aachen University

Publications

2023

  • Treatment Effect Estimation to Guide Model Optimization in Continual Learning (Jonas Seng, Florian P. Busch, Matej Zečević, Moritz Willig), Continual Causality Bridge Program (@AAAI 2023)
  • Causal Concept Identification in Open World Environments, (Moritz Willig, Matej Zečević, Jonas Seng, Florian P. Busch), Continual Causality Bridge Program (@AAAI 2023)
  • SLURMminer: A Tool for SLURM System Analysis with Process Mining, (Zahra Sadeghibogar, Alessandro Berti, Marco Pegoraro, Wil MP van der Aalst), BPM 23

 

  • Continually Updating Neural Causal Models, (Florian P. Busch, Jonas Seng, Moritz Willig, Matej Zečević), Continual Causality Bridge Program (@AAAI 2023)
  • Continual Causal Abstractions, (Matej Zečević, Moritz Willig, Florian P. Busch, Jonas Seng), Continual Causality Bridge Program (@AAAI 2023)
  • Exploring SLURM Logs through Process Mining: Insights into Scientific Workflows, (Zahra Sadeghibogar, Alessandro Berti, Marco Pegoraro, Wil MP van der Aalst), BPM 23