Decision-making Analysis by Attention Mechanism and Applications
Deep learning, which has recently found use in image recognition, robot control, and other applications, has had remarkable achievements in a variety of fields. However, as deep learning constructs a network from a massive number of parameters, it is extremely difficult to explain the decision making involved in outputting a certain recognition result. In our research, we have taken up the problem of understanding decision making in deep learning through visual explanation that highlights attention regions in a deep-learning recognition process.
Attention Branch Network
Typical visual explanation techniques are capable of visualizing attention regions in deep learning but they have so far made no contributions at all to improving accuracy in the recognition process. The Attention Branch Network (ABN) that we propose applies an attention mechanism to the attention regions (referred to as “attention map” below) obtained for visual explanation thereby achieving both visualization for visual explanation and improved accuracy simultaneously. In addition to image classification tasks, the proposed ABN can be applied to a variety of image recognition fields such as multi-task learning.
Application of ABN to end-to-end learning-based self-driving control
Self-driving control using a Deep Convolutional Neural Network (DCNN) trains a network through end-to-end learning using the input image and car control values. End-to-end learning-based self-driving control can obtain car control values directly from the input image in contrast to a step-by-step process consisting of lane detection, motion planning, etc. Here, however, the output of car control values is based on a DCNN, so the reason for outputting certain car control values is unfortunately unclear. In our research, we are analyzing the decision-making process in self-driving control by using an ABN. Self-driving control, though, involves a regression problem, so we introduce Weighted Global Pooling (WGP) into the attention branch of the ABN. The WGP method can estimate car control values with high accuracy even for a regression problem by performing weighting using a feature map and convolutional kernel of the same size at the time of pooling. In experiments, we used the video game Grand Theft Auto V (GTA V) as an experimental environment. When visualizing the attention map at the time of a curve or a stop, we found that attention was focused on a white line or the car in front and confirmed that the reason for a DCNN decision could be visually output.
Introduction of human knowledge via attention map
There are cases in which the desired recognition result cannot be obtained such as when multiple objects are present in the image or when advanced expertise is needed for labeling as in medical image recognition systems or other critical applications. In such examples, improvement by a typical relearning method is difficult, so we decided to use the attention map of an ABN considering that the desired recognition result can be obtained through manual modification of that attention map. The proposed method calculates the difference between the output attention map and the manually modified attention map and fine-tunes the ABN. This approach makes it possible to output an attention map that takes human knowledge into account and to improve recognition accuracy as a result. In evaluation experiments, we confirmed that recognition accuracy could be improved and that more precise attention maps could be obtained.
Improvement of ABN reliability by introducing uncertainty
Convolutional Neural Networks (CNNs) are being used in a variety of fields centered about image recognition and are achieving high levels of recognition accuracy. However, existing CNNs cannot consider uncertainty in the prediction result, that is, the difficulty of prediction, which means the extent to which the prediction is reliable is unclear. This problem is considered to be the cause of erroneous decisions in the practical use of CNN. In this research, we propose a Bayesian Attention Branch Network (Bayesian ABN) that introduces uncertainty into an ABN. That is to say, the proposed method considers uncertainty in the CNN prediction result by introducing a Bayesian Neural Network (Bayesian NN) into the ABN. In addition, the method focuses on a structure that outputs prediction results from two branches and adopts the result having a lower value of uncertainty. In evaluation experiments using a standard object recognition dataset, we confirmed that the proposed method improves CNN accuracy and reliability.
Domain style conversion for discrimination with attention mechanism
Domain style conversion is a typical application example of a Generative Adversarial Network (GAN), which is a generative technique using adversarial learning. Domain style conversion is used in domain adaptation, which uses an unlabeled image targeted for discrimination and a labeled image not targeted for discrimination, and in few-shot learning having a very small number of training images. However, style conversion is not easy in a state that maintains context beneficial to discrimination in the image. We therefore propose domain style conversion incorporating an attention mechanism to enable style conversion of regions beneficial to discrimination in the input image. This method enables style conversion by focusing on those regions beneficial to discrimination and enables training that, while not necessarily pretty in nature, captures targets of discrimination. Evaluation experiments showed that it was possible to obtain attention maps of regions beneficial to discrimination and thereby improve discrimination performance.