機械知覚&ロボティクスグループ
中部大学

Image Processing 学術論文(E)

Efficient Action Spotting Using Saliency Feature Weighting

Author
Yuzhi SHI, Takayoshi YAMASHITA, Tsubasa HIRAKAWA, Hironobu FUJIYOSHI, Mitsuru NAKAZAWA, Yeongnam CHAE, Björn STENGER
Publication
IEICE TRANSACTIONS on Information and Systems, Vol.E107-D, No.1, pp.105-114, 2024

Download: PDF (English)

Action spotting is a key component in high-level video understanding. The large number of similar frames poses a challenge for recognizing actions in videos. In this paper we use frame saliency to represent the importance of frames for guiding the model to focus on keyframes. We propose the frame saliency weighting module to improve frame saliency and video representation at the same time. Our proposed model contains two encoders, for pre-action and post-action time windows, to encode video context. We validate our design choices and the generality of proposed method in extensive experiments. On the public SoccerNet-v2 dataset, the method achieves an average mAP of 57.3%, improving over the state of the art. Using embedding features obtained from multiple feature extractors, the average mAP further increases to 75%. We show that reducing the model size by over 90% does not significantly impact performance. Additionally, we use ablation studies to prove the effective of saliency weighting module. Further, we show that our frame saliency weighting strategy is applicable to existing methods on more general action datasets, such as SoccerNet-v1, ActivityNet v1.3, and UCF101.

前の研究 次の研究