Student Capacity Moderates Knowledge Distillation Effectiveness: A Systematic Study Across ResNet Teacher-Student Pairs on CIFAR-10
Published in arXiv, 2026
Systematic ablation study comparing Logit-KD and Feature-KD across ResNet teacher-student pairs on CIFAR-10. Key findings: student capacity — not teacher-student accuracy gap — is the primary moderator of KD effectiveness; and implementation correctness critically affects Feature-KD performance.
Recommended citation: Yaşar, U. O. (2026). Student Capacity Moderates Knowledge Distillation Effectiveness. arXiv:2605.31191.
Download Paper
