Gradual Sampling Gate for Bidirectional Knowledge Distillation
- Soma Minami, Takayoshi Yamasita, Hironobu Fujiyoshi
- International Conference on Machine Vision Applications(MVA), 2019
Download: PDF (English)
Knowledge distillation is an efficient approach for model compression. It is based on a unidirectional scheme that transfers knowledge from a large, pre-trained network to a smaller one. A bidirectional scheme was recently proposed that achieved a higher performance than a unidirectional distillation. However, network training is disturbed at the early training stage of bidirectional distillation by the transfer of knowledge between them. We propose a “gradual sampling gate” that controls soft target loss by referring to the training accuracy of each network. Our bidirectional distillation method can improve the accuracy without increasing the computational cost. To evaluate our method, we compare classification accuracies with several network models (ResNet32, ResNet110, WideResNet, and DenseNet) over various datasets (CIFAR-10, CIFAR-100, SVHN, and Tiny ImageNet). Experimental results show that our method can effectively train networks and achieve higher accuracies.