Learning Frequency-Aware Spatial Attention by Reconstructing Images With Different Frequency Responses
- Author
- Keisuke Sano, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi
- Publication
- IEEE Access, 2025
Download: PDF (English)
Convolutional Neural Networks are widely used in various real-world applications due to their exceptional performance. To further enhance this effectiveness, many approaches utilize spatial attentions that emphasize features in specific regions critical for recognition. However, most existing methods compute the attention area as single-channel matrices for each input, failing to distinguish between coarse and fine features. This limitation hinders recognition performance, particularly in fine-grained classification scenarios where both global and detailed features are crucial. In this paper, we propose a novel method that computes separate spatial attention maps for high-frequency and low-frequency features. Our approach employs frequency-aware reconstruction losses applied to images reconstructed from divided feature maps corresponding to high and low frequencies, enabling separate calculation of spatial attention for each frequency band. Through comprehensive quantitative and qualitative evaluations on both general and fine-grained image classification datasets, we demonstrate that our method achieves higher accuracy compared to the Attention Branch Network, a representative spatial attention method. Furthermore, by fine-tuning the frequency-separated spatial attention through the incorporation of human perceptual knowledge, we achieve additional improvements in classification accuracy.