Action Spotting in Soccer Videos Using Multiple Scene Encoders
- Yuzhi Shi, Hiroaki Minoura, Takayoshi Yamashita, Tsubasa Hirakawa, Hironobu Fujiyoshi, Mitsuru Nakazawa, Yeongnam Chae, Bjorn Stenger
- International Conference on Pattern Recognition, 2022
Download: PDF (English)
Action spotting, which temporally localizes specific actions in a video, is an important task for understanding high-level semantic information. In this paper, we formulate the action spotting task to one of scene sequence recognition and propose a model with multiple scene encoders to capture scene changes around the timestamp where an action occurs. We divide the input into multiple subsets to reduce the influence of scene context that is temporally distant, and feed every subset into a scene encoder to learn scene context in every subset. Because the optimal temporal length for time windows (chunks) is different for each action, we analyze the influence of chunk sizes for action spotting. The experimental results on the public SoccerNet-v2 dataset demonstrate state-of-the-art accuracy. By using embedding features, our method obtains an Average-mAP of 75.3%. In addition, we confirm that the performance can be improved by using optimal chunk sizes for different actions.