Dept. of Robotics Science and Technology,
Chubu University

Deep Learning Conference

Gradual Sampling Gate for Bidirectional Knowledge Distillation

Author
Soma Minami, Takayoshi Yamasita, Hironobu Fujiyoshi
Publication
International Conference on Machine Vision Applications, 2019

Download: PDF (English)

Knowledge distillation is an efficient approach for model compression. It is based on a unidirectional scheme that transfers knowledge from a large, pre-trained network to a smaller one. A bidirectional scheme was recently proposed that achieved a higher performance than a unidirectional distillation. However, network training is disturbed at the early training stage of bidirectional distillation by the transfer of knowledge between them. We propose a “gradual sampling gate” that controls soft target loss by referring to the training accuracy of each network. Our bidirectional distillation method can improve the accuracy without increasing the computational cost. To evaluate our method, we compare classification accuracies with several network models (ResNet32, ResNet110, WideResNet, and DenseNet) over various datasets (CIFAR-10, CIFAR-100, SVHN, and Tiny ImageNet). Experimental results show that our method can effectively train networks and achieve higher accuracies.

Previous Next