Project
Distributed Neural Network Inference over Low-Power Wireless Networks: Design and Real-World Evaluation
Modern Machine Learning models are growing increasingly large, which poses significant challenges for running and training them on edge devices with limited computing power and memory. This project tackles the problem of deploying Machine Learning at the edge by exploring how multiple edge devices can cooperatively execute and train these models while communicating over a wireless network. To address this challenge, this project splits itself in two branches: First, the distributed inference of Transformer networks, and second, distributed learning strategies.
Project Details
Project term
January 6, 2024–January 6, 2025
Affiliations
RWTH Aachen University
Institute
Institute for Data Science in Mechanical Engineering
Principal Investigator
Methods
Distributed Inference of Transformer Networks
Transformer networks have become increasingly vital in processing language, image, and time series data. In this branch of the project, we explore strategies that enable multiple edge devices to collaboratively execute large transformer networks that are too large for a single device.
Phase 1: Distributed Inference of Vision Transformers.
In the first phase of our project, we focused on Vision Transformers (ViTs). We trained ViTs of varying sizes and conducted thorough evaluations of the per-device resource consumption during distributed inference. Our experimental results indicate that the distributed inference approach scales efficiently with the number of devices. In other words, as more devices participate, we can deploy larger ViT models, which in turn yield more accurate predictions. However, this scalability comes with a critical trade-off. Owing to the inherent size of most ViTs, each operation must be split among multiple devices, leading to significant communication overhead. This overhead ultimately slows down the inference process, rendering it impractical for many real-world applications.
Phase 2: Inference of Time Series Transformers.
To address the limitations encountered with ViTs, the second phase of our project focuses on transformer architectures designed specifically for time series data. These smaller models are more amenable to distribution across edge devices, making them a promising alternative. We have developed a specialized distribution method for these transformers, which we are currently refining through the integration of targeted pruning and dropout techniques. These enhancements are intended to significantly reduce communication overhead and improve the system’s robustness against message loss. Preliminary experimental results are encouraging, and further evaluations are underway as part of the project’s second year work plan.
ParallelWork: Foundational Models for Industrial Time Series.
In parallel with our work on distributed inference methods, we have been training foundational models specifically tailored for industrial time series data. Preliminary results indicate that these models demonstrate robust zero-shot generalization capabilities along with highly data-efficient fine-tuning behaviors. This promising performance suggests a valuable avenue for further exploration and application in industrial settings.
Distributed Learning on Ultra-Low-Power Hardware
In another branch of the project, we explored distributed learning techniques on ultralow-power microcontrollers communicating via wireless mesh networks. In principle, there exist two distributed learning techniques:
- Split Learning: The model and its training process are partitioned among multiple
devices, distributing the computational workload effectively across the network. - Federated Learning: Devices collaboratively train a shared model using their
local data without exchanging raw data, thereby preserving data privacy.
In each of these branches, we developed new methodologies designed to overcome the constraints of limited hardware and communication, and that ensure efficient, collaborative training: These approaches are the first split/federated learning methods capable of running on ultra-low-power wireless devices and are currently under review for publication.
Future Work
In the upcoming second year, our efforts will focus on several key areas:
- A deeper investigation into the distributed inference approach for transformer models through extensive training studies.
- The training and evaluation of additional foundational models for time series data, exploring different architectures and datasets.
- An exploration of federated black-box optimization techniques, whereby multiple devices collaboratively solve coupled optimization problems while ensuring data privacy.
Additional Project Information
DFG classification: 409 Computer Science
Software: TensorFlow, PyTorch, JAX
Cluster: CLAIX
Publications
Thesis:
Ding Huo, Distributed Inference of Transformer Networks in Low-Power Sensor Networks,
2025, Master Thesis