Acquiring 3D Gaze Scan Path by First Person Vision
The objective of First Person Vision (FPV) is to recognize the surrounding environment from a video image of the viewpoint of a user who is using a head-mounted wearable sensor, and understand the behavioral intention by combining it with gaze information of the user. We propose an “inside-out camera” that simultaneously acquires video of a person’s eyeballs and the visual field of that person, and a method of estimating the gaze point of the person by making the most of the configuration of that camera system. The inside-out camera can capture visual-field footage from a position that is equivalent to that of the eyeballs, from the surface of the eyeballs as seen through a half mirror. With this inside-out camera, we can search for the gaze point position on the image taken of the visual field, from the gaze vectors obtained from the images taken of the eyeballs.
An inside-out camera is a goggle-like gaze measurement device configured of two eye cameras located at the top of googles, as shown in the upper figure, which obtain images of the eyeballs, and two scene cameras located at the bottom of the googles, which obtain images of the visual field. The eye cameras consist of two infrared cameras that obtain images of the left and right eyeballs, with six infrared LEDs arranged around each of the cameras. The LEDs are arranged around the circumference of the cameras and emit near-infrared light of wavelength 750 mm to 900 mm (central: wavelength 850 mm), with a directionality of 40 degrees. An infrared mirror is a mirror that passes 95% of visible light but reflects 95% of infrared light. Since an infrared mirror is installed at an angle of 45 degrees to the optical axes of the infrared cameras, the infrared cameras installed in the upper part of the googles can obtain images from the surfaces of the user’s eyeballs through the infrared mirror. Infrared light is invisible, making it possible to obtain the images of the eyeballs with no visual stimulus. The scene cameras consist of two small CCD cameras that capture the left and right visual fields, and a half mirror. The half mirror reflects 50% of visible light, passing through the remainder. Since the half mirror is installed at an angle of 45 degrees to the optical axes of the CCD cameras in a similar manner to the infrared mirror, as shown in the figure, we can capture footage of the scene from a position that is optically substantially the same as that of the person’s viewpoint, through the half mirror. In addition, the baseline between the two cameras is approximately 6.5 cm, enabling the use of stereo vision.
3D Scan Path
It is necessary to comprehend the user’s intentions and actions by measuring the trajectory in three-dimensional space of the gaze point of the user who is moving dynamically. We propose a system for reproducing a 3D scan path and scene configuration based on self movements computed from the inside-out camera in the three-dimensional space. Using sequence footage acquired from the scene cameras, we estimate the self-movements of a dynamically moving user and reproduce a three-dimensional structure of the scene in each frame taken by the stereo cameras. In estimating the three-dimensional gaze point we first perform a gaze estimate from the eyeball images of the eye cameras, then we estimate the gaze point in two-dimensional coordinates of each frame, using the estimation result of gaze. Finally, we estimate the gaze point in three dimensions by a stereo view from the estimation results of the two-dimensional gaze point of the two eyes. We can reproduce the 3D scan path by integrating the self movement estimation result and the three-dimensional gaze point estimation result in each frame.