MPRG : Machine Perception and Robotics Group / Dept. of Robotics Science and Technology

01 Jun 2011

Human Detection by Joint Features and Its Hardware Implementation

Human detection is a technique of specifying the position and size of a person in video footage which is expected to be used in fields such as security, intelligent transport systems (ITS), and marketing. In the detection of people, it is thought that an effective method is to perceive shapes such as the Ω-like shape extending from the head to the shoulders, a continuous shape from the upper half of the body to the lower half, or a symmetrical shape such as those of the feet. We propose an object detection method based on joint features that uses two-stage Real AdaBoost to automatically perceive the symmetry or continuity of an object’s shape, as shown in the figure. This enables the perception of connectivity between features by using boosting to cause combining of a number of low-level features, so we can expect highly accurate detection.

Joint features and Two-Stage Boosting
To generate joint features for the first stage, we obtain low-level features from two different local regions (cells) and represent the co-occurrence between the features of the differing cells by a co-occurrence representation method. In addition, the best combination according to Real AdaBoost is automatically selected as a weak classifier. For the second stage, a joint feature that is effective for the classification can be selected by learning by Real AdaBoost to which the joint feature generated by the first stage is input. The framework of the joint features can be supplemented by features other than HOG features that represent human appearance (such as time-space features or distance features). In addition, the joint features are characterized in facilitating analysis of why there has been a failure in detection due to some sort of local region.

decomposition

Hardware Implementation: Joint-HOG FPGA

For the hardware implementation of object detection using the statistical learning method, the learning results assuming detection candidates under the environment to be used are mounted on the hardware. However, it may be necessary to redesign the hardware if the usage environment is different or a detection candidate has changed. We have developed a field-programmable gate array (FPGA) system using joint HOGs that links to a software API to enable the changing of detection candidates on the same hardware. This joint-HOG FPGA system uses Altera’s Cyclone III FPGA and detects images that are input through a camera link. We have implemented execution on the FPGA within a substantially practicable time (approximately 20 fps), by processing each stage in a pipeline and speeding up by combination with parallel processing for each window.

decomposition