CSG Data Science and Machine Learning
and RWTH Aachen University
The CSG Data Science and Machine Learning provides training (e.g., videos, tutorials, code, etc.) in machine learning and process mining techniques.
Our goals are to guide users through machine learning and provide infrastructure to make machine learning on HPC easily accessible while also increasing the scalability of the process mining techniques.
We can help researchers analyze data to identify patterns or make predictions. We offer support with any data containing a case identifier, an activity, and a timestamp within process mining. In addition, we offer support for the implementation and analysis of scientific workflows.
In the integration of process mining with HPC, we have two important parts one PM4SW, which is analyzing large scientific workflows (Simulation, ML, AI, PM, etc.) executed on the HPC cluster. Analyzing performance, bottlenecks and improving scheduling and planning. Workflow systems currently investigated are Camunda, Knime, RapidMiner, Integromat.
The second part is SW4PM, supporting process mining workflows for scientific experiments to facilitate the use and improve performance. The tools investigated for this part are ProM, Celonis, PM4Py. In addition, looking at distributing process mining algorithms, using GPUs, etc.
We also develop new automated machine learning (AutoML) solutions and provide them to users. We support applications in identifying patterns, statistics/causal dependencies identification, process discovery, and conformance checking.
As prediction techniques, we utilize (classical) Machine Learning, Deep Learning, Probabilistic Graphical Modelling, Process Mining, Data Visualization, and Reinforcement Learning. In addition, we work with Tensorflow & PyTorch, NumPy, SciPy, SimPy, Dataframes (Pandas, Spark), Scikit-Learn, and Dask.
Current research topics:
- Data Science Workflows in HPC
- Federated Learning
- Process Mining in different modalities
- SW4PM: scientific workflows for process mining and other analysis techniques
- PM4SW: process mining for scientific workflows
- Learning material for Process Mining & Machine Learning
- Learning material to provide an understanding of the type of data needed by these techniques
- Support in overcoming data size/quality/privacy issues (in cooperation with CSG Data Management)
- Assistance in specific infrastructural challenges
- Support for scientists with Machine Learning or Process Mining problems applied to their specific research field
Planned teaching activities:
- Videos for specific aspects (data parallelization in process mining, Python libraries for data science such as Tensorflow, Spark, Dask) of data science/machine learning in HPC
- Process Mining Summer School 2022 (Aachen)
- Hands-on workshop on ML@HPC
Training offers 2023:
- Treatment Effect Estimation to Guide Model Optimization in Continual Learning (Jonas Seng, Florian P. Busch, Matej Zečević, Moritz Willig), Continual Causality Bridge Program (@AAAI 2023)
- Causal Concept Identification in Open World Environments, (Moritz Willig, Matej Zečević, Jonas Seng, Florian P. Busch), Continual Causality Bridge Program (@AAAI 2023)
- SLURMminer: A Tool for SLURM System Analysis with Process Mining, (Zahra Sadeghibogar, Alessandro Berti, Marco Pegoraro, Wil MP van der Aalst), BPM 23
- Continually Updating Neural Causal Models, (Florian P. Busch, Jonas Seng, Moritz Willig, Matej Zečević), Continual Causality Bridge Program (@AAAI 2023)
- Continual Causal Abstractions, (Matej Zečević, Moritz Willig, Florian P. Busch, Jonas Seng), Continual Causality Bridge Program (@AAAI 2023)
- Exploring SLURM Logs through Process Mining: Insights into Scientific Workflows, (Zahra Sadeghibogar, Alessandro Berti, Marco Pegoraro, Wil MP van der Aalst), BPM 23