MPRG : 機械知覚&ロボティクスグループ／中部大学

09 Jun 2025 学術論文（E）

Distilling Diverse Knowledge for Deep Ensemble Learning

Author: Naoki Okamoto, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi
Publication: IEEE Access, 2025

Download: PDF (English)

Bidirectional knowledge distillation improves network performance by sharing knowledge between networks during the training of multiple networks. Additionally, performance is further improved by using an ensemble of multiple networks during inference. However, the performance improvement achieved by an ensemble of networks trained with bidirectional knowledge distillation is lower compared to a general ensemble that does not use knowledge distillation. From this trend, we think there is a relationship between network diversity, which is essential for performance improvement through ensembling, and the networks’ shared knowledge. Therefore, we present a distillation strategy to promote network diversity for ensemble learning. Since different types of network diversity can be considered, we design loss functions to separate knowledge and automatically design an effective distillation strategy for ensemble learning by performing a hyperparameter search using these loss functions as hyperparameters. Furthermore, considering network diversity, we design a network compression method for the ensemble and obtain a single network with performance equivalent to that of the ensemble. In the experiments, we automatically design distillation strategies for ensemble learning and evaluate the ensemble accuracy on five classification task datasets.