Video Object Detection and Tracking based on Angle Consistency between Motion and Flow
- Toshiki Seo, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi
- IEEE Intelligent Vehicle Symposium, 2020
Download: PDF (English)
Detect and Track (D&T) extracts a foreground region by using a feature map and region proposal network (RPN) and estimates an object class by using fully connected layers. A correlation layer, which is a hidden layer that obtains displacement between adjacent frames, estimates the movement and size of an object between the adjacent frames. Then, object class and regression are estimated by the feature maps obtained from the correlation layer and RPN. Finally, D&T estimates the moving direction and movement of a bounding box from the detection results obtained from the correlation layer and adjacent frames. Although D&T can achieve accurate object detection and tracking, the object detection and movement estimation of the correlation layer relies on the detection results of the RPN. Therefore, the correlation layer does not acquire local and global pixel changes in video frames and has to estimate the moving direction only from the similarity of detected regions. As a result, the estimation of the moving direction tends to fail. In this work, we propose a method to improve the moving direction estimation by performing the estimation in such a way as to maintain the consistency between the estimated direction and optical flow. Experimental results show that the proposed method can successfully estimate the moving direction and thereby improves both the detection and the tracking accuracy.