MPRG : 機械知覚&ロボティクスグループ／中部大学

03 Jun 2024 国際会議

Human-like Guidance by Generating Navigation Using Spatial-Temporal Scene Graph

Author: Hayato Suzuki, Kota Shimomura, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Shota Ohkubo, Takuya Nanri, Wang Siyuan
Publication: IEEE Intelligent Vehicle Symposium, 2024

Download: PDF (English)

Vehicle navigation systems use both GPS and map data, primarily information derived from map data. Conventional navigation systems assume that the user will look directly at the display to check information. Simultaneously provided text and voice often play only a supplementary role, which can lead to driver distraction and misinterpretation. In contrast, human navigation utilizes visual information, potentially reducing the cognitive load on drivers. Human-like Guidance is aimed at realizing a driving assistance system that supports navigation akin to human guidance. Implementing Human-like Guidance, requires the handling of video footage from in-vehicle cameras during vehicle operation, suggesting the need for an approach combining image recognition and language model. However, images captured during operation often contain superfluous information, making the selection of relevant objects for navigation challenging. Moreover, relying solely on image information makes it difficult to consider the relationship with surrounding objects. Therefore, this study proposes a Spatial-Temporal Scene Graph that can represent spatial and temporal information of objects from driving scene videos. Furthermore, we achieve Human-like Guidance through navigation generation using features extracted from the Spatial-Temporal Scene Graph. Our results show that our proposed method improves the accuracy of navigation generation accuracy compared to traditional image-based navigation methods. In addition, the use of a Spatial-Temporal Scene Graph enables the generation of human-like navigation that focuses on the movements of surrounding vehicle objects.