Efficient Learning of Object Detection (Automatic Generation of Learning Samples and Hybrid-Type Transfer Learning)
Since object detection methods based on statistical learning methods are strongly dependent on the learning samples, there is a problem in that if the installation environments of the learning samples and cameras are different, detection performance cannot be sufficiently exhibited. It is necessary to collect new learning samples in the installation environment and repeat the learning. However, this causes problems since the collection of learning samples is very costly in terms of personnel and the temporal costs required for the learning. To address these problems, there is the approach of implementing higher levels of accuracy and efficiency concentrating on specific scenes, by doing generation-type learning using computer graphics (CG) for three-dimensional models of the human body in the collection of learning samples, and we are working on making the learning time more efficient by hybrid-type transfer learning.
Generation-Type Learning Using Three-Dimensional CG Models of Human Bodies
We automatically generate silhouette images of human bodies in the installation environment, using three-dimensional models of the human body. The shape model of the human body consists of 19 parts and we represent the human body model in a walking pose by applying walking-action parameters to those 19 parts. To obtain human body silhouette images that are specialized for a specific scene, we input the parameters of the camera that is installed in the actual environment into the three-dimensional human body models and use the thus-generated human body silhouette images as positive samples for learning. Since we generate the learning samples by CG, we can generate a large quantity of human body silhouette images with no mis-positioning. Negative samples are cut out at random from the captured video footage. However, there is a problem with collecting images of people as negative samples when the samples have been collected at random. To solve this problem, the system learns classifiers by multiple instance learning boost (MILBoost), from consideration of mixing of mistakenly labeled samples. This enables learning that is not adversely affected, even when people’s images seem to be mixed into the bag of negative samples.
Hybrid-Type Transfer Learning
With human detection based on statistical learning methods, there is a large problem in the personnel cost involved in the collection of learning samples and the temporal costs for re-learning to suit a specific scene. Transfer learning based on boosting has been proposed as a method of reducing the labor involved in sample collection, but if there are large changes between the prior learning scene and the specific scene, adapting to those changes can be difficult. We therefore propose a hybrid-type transfer learning method which provides two feature spaces, one for features obtained by transfer and another for all feature searches similar to re-learning, and which selectively switches the feature space based on a defined learning efficiency. This has the features of rapid classifier construction with the small number of samples typical of transfer learning, and implements an increased accuracy of 8.35% in comparison with the previous transfer learning method at a higher speed of at least 3.2 times that of re-learning.