Dept. of Robotics Science and Technology,
Chubu University

Human Detection Conference

Action Spotting in Soccer Videos Using Multiple Scene Encoders

Author
Yuzhi Shi, Hiroaki Minoura, Takayoshi Yamashita, Tsubasa Hirakawa, Hironobu Fujiyoshi, Mitsuru Nakazawa, Yeongnam Chae, Bjorn Stenger
Publication
International Conference on Pattern Recognition, 2022

Download: PDF (English)

Action spotting, which temporally localizes specific actions in a video, is an important task for understanding high-level semantic information. In this paper, we formulate the action spotting task to one of scene sequence recognition and propose a model with multiple scene encoders to capture scene changes around the timestamp where an action occurs. We divide the input into multiple subsets to reduce the influence of scene context that is temporally distant, and feed every subset into a scene encoder to learn scene context in every subset. Because the optimal temporal length for time windows (chunks) is different for each action, we analyze the influence of chunk sizes for action spotting. The experimental results on the public SoccerNet-v2 dataset demonstrate state-of-the-art accuracy. By using embedding features, our method obtains an Average-mAP of 75.3%. In addition, we confirm that the performance can be improved by using optimal chunk sizes for different actions.

Previous Next