六后宝典资料大全

搜索
你想要找的

12月19日 孟德宇:MLR-SNet (Meta-LR-Schedule-Net): Transferable LR Schedules for Heterogeneous Tasks
2024-12-19 15:00:00
活动主题:MLR-SNet (Meta-LR-Schedule-Net): Transferable LR Schedules for Heterogeneous Tasks
主讲人:孟德宇
开始时间:2024-12-19 15:00:00
举行地点:普陀校区理科大楼A1114
主办单位:统计学院、统计交叉科学研究院
报告人简介

孟德宇,西安交通大学教授,博导,大数据算法与分析技术国家工程实验室机器学习教研室负责人。发表论文百余篇,谷歌学术引用超过31000次。现任IEEE Trans.PAMI,NSR等7个国内外期刊编委。目前主要研究聚焦于元学习、概率机器学习、可解释性神经网络等机器学习基础研究问题。


内容简介

The learning rate (LR) is one of the most important hyperparameters in stochastic gradient descent (SGD) algorithm for training deep neural networks (DNN). However, current hand-designed LR schedules need to manually pre-specify a fixed form, which limits their ability to adapt to practical non-convex optimization problems due to the significant diversification of training dynamics. Meanwhile, it always needs to search proper LR schedules from scratch for new tasks, which, however, are often largely different with task variations, like data modalities, network architectures, or training data capacities. To address this learning-rate-schedule setting issues, we propose to parameterize LR schedules with an explicit mapping formulation, called \textit{MLR-SNet}. The learnable parameterized structure brings more flexibility for MLR-SNet to learn a proper LR schedule to comply with the training dynamics of DNN. Image and text classification benchmark experiments substantiate the capability of our method for achieving proper LR schedules. Moreover, the explicit parameterized structure makes the meta-learned LR schedules capable of being transferable and plug-and-play, which can be easily generalized to new heterogeneous tasks. We transfer our meta-learned MLR-SNet to query tasks like different training epochs, network architectures, data modalities, dataset sizes from the training ones, and achieve comparable or even better performance compared with hand-designed LR schedules specifically designed for the query tasks. The robustness of MLR-SNet is also substantiated when the training data are biased with corrupted noise. We further prove the convergence of the SGD algorithm equipped with LR schedule produced by our MLR-Net, with the convergence rate comparable to the best-known ones of the algorithm for solving the problem. {The source code of our method is released at https://github.com/xjtushujun/MLR-SNet.)