Project: #IITM-251101-208

Operationalising multimodal fusion architectures for RF-enhanced edge perception

Campus: Burwood

Available

Emerging edge perception systems aim to combine radio-frequency (RF) sensing (e.g., WiFi, mmWave radar, UWB) with complementary modalities such as vision, LiDAR, inertial, and environmental sensors to support safety-critical and context-aware applications, including autonomous driving, smart infrastructure, and human activity monitoring [1]-[3].RF sensing can enable device-free, privacy-preserving awareness that is resilient to low light and occlusion [1], while cameras and other sensors offer rich semantic information [2],[4]. Integrating these modalities at the network edge promises low-latency, bandwidth-efficient, and privacy-preserving perception, avoiding continuous raw data streaming to the cloud and enabling timely decisions close to where data is generated [1],[3].

Work on RF and vision fusion has made significant progress in algorithmic techniques and datasets for tasks such as object detection and activity recognition, and Edge AI research has highlighted the importance of lightweight, on-device and near-device inference [2]-4]. However, most multimodal RF integrated vision systems are engineered as bespoke pipelines on powerful servers or in-vehicle platforms, with limited consideration of constrained, heterogeneous edge environments. Existing work typically optimises model accuracy in isolation, rather than providing operational, reusable architectures that specify how modalities are acquired, synchronised, fused, and mapped onto edge resources in real time [1],[4]. As a result, there is a clear gap between algorithm-level RF-vision fusion research and systems-level frameworks that can be instantiated and managed on realistic edge infrastructures [1].

This research will address these limitations by developing and evaluating an operational multimodal fusion architecture for RF-enhanced edge perception. The novelty lies in moving beyond algorithmic fusion to define a systems-level reference architecture that explicitly supports modality management, fusion pattern abstraction, and edge resource mapping under real-world constraints.

The PhD project will (i) define an edge-centric reference architecture that integrates RF, vision, and auxiliary sensing modalities with abstractions for modality management, synchronisation, and fusion patterns

(ii) design representative fusion pipelines for selected use cases (e.g., smart intersections, indoor occupancy sensing) that can run on heterogeneous edge platforms under latency and resource constraints

and (iii) prototype and evaluate these pipelines on a small-scale testbed combining RF and non-RF sensors, measuring end-to-end performance, robustness, and resource usage. The outcome will be a set of design principles and architectural patterns experimentally validated on testbeds, paving the way for scalable and robust RF-enhanced edge perception frameworks.

References

[1] Cui, Y., Cao, X., Zhu, G., Nie, J. and Xu, J., 2025. Edge perception: Intelligent wireless sensing at network edge. IEEE Communications Magazine, 63(3), pp.166-173.

[2] Wei, Z., Zhang, F., Chang, S., Liu, Y., Wu, H. and Feng, Z., 2022. Mmwave radar and vision fusion for object detection in autonomous driving: A review. Sensors, 22(7), p.2542.

[3] Singh, R. and Gill, S.S., 2023. Edge AI: a survey. Internet of Things and Cyber-Physical Systems, 3, pp.71-92.

[4] Wang, H., Liu, J., Dong, H. and Shao, Z., 2025. A Survey of the Multi-Sensor Fusion Object Detection Task in Autonomous Driving. Sensors, 25(9), p.2794.

Operationalising multimodal fusion architectures for RF-enhanced edge perception

Ayon Chakraborty

ayon@cse.iitm.ac.in

Niroshinie Fernando

niroshinie.fernando@deakin.edu.au

Seng Loke

seng.loke@deakin.edu.au