Project: #IITM-251101-201

Visual Understanding of Group Activities in Dynamic Environments

Campus: Burwood
Available

Understanding and interpreting human activities in dynamic and visually complex environments remains a central challenge in computer vision and artificial intelligence. Real-world scenes often involve multiple interacting entities, changing contexts, and continuous motion, all of which demand models that can reason about spatial relationships and temporal evolution simultaneously.

In sports analytics, for example, visual understanding extends beyond recognising isolated actions to analysing how movements, strategies, and contextual cues unfold over time. Detecting and interpreting events such as passes, shots, or defensive manoeuvres in soccer or basketball requires the ability to track objects, model motion dynamics, and extract meaningful patterns from continuous video streams. These challenges are amplified by variations in camera motion, occlusion, and the high speed of activity, which together make robust visual understanding a complex problem.

This PhD project aims to develop novel computer vision and deep learning methods for activity recognition, representation, and prediction in dynamic visual environments. Real-world scenes, such as those encountered in sports, surveillance, or human–robot interaction, involve rapidly changing motion patterns, complex spatial relationships, and evolving contextual cues. Effectively modelling these spatiotemporal dynamics remains a major challenge for current vision systems.

The research will explore advanced deep learning architectures—including transformer-based video models, graph neural networks, and spatiotemporal attention mechanisms—to capture fine-grained contextual and temporal dependencies across multiple scales. These models will be designed to integrate appearance, motion, and relational information, enabling a richer and more holistic understanding of ongoing activities.

The project will combine algorithm development with applied evaluation on benchmark datasets and real-world applications, particularly in sports analytics, crowd behaviour analysis, or intelligent surveillance. Through this research, the candidate will contribute to advancing the ability of AI systems to perceive, reason, and make informed predictions in dynamic visual environments, supporting real-time decision-making and strategic insight generation.

The ideal candidate will have a solid background in information technology, computer science, or a related discipline, along with strong programming skills. A good understanding of linear algebra and statistics will be considered an advantage.