<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Liang, Jinxiu</style></author><author><style face="normal" font="default" size="100%">Yong Xu</style></author><author><style face="normal" font="default" size="100%">Bao, Chenglong</style></author><author><style face="normal" font="default" size="100%">Quan, Yuhui</style></author><author><style face="normal" font="default" size="100%">Ji, Hui</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Barzilai–Borwein-based Adaptive Learning Rate for Deep Learning</style></title><secondary-title><style face="normal" font="default" size="100%">Pattern Recognition Letters (PRL)</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">2019</style></year></dates><volume><style face="normal" font="default" size="100%">128</style></volume><pages><style face="normal" font="default" size="100%">197–203</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">Learning rate is arguably the most important hyper-parameter to tune when training a neural network. As manually setting right learning rate remains a cumbersome process, adaptive learning rate algorithms aim at automating such a process. Motivated by the success of the Barzilai–Borwein (BB) step-size method in many gradient descent methods for solving convex problems, this paper aims at investigating the potential of the BB method for training neural networks. With strong motivation from related convergence analysis, the BB method is generalized to adaptive learning rate of mini-batch gradient descent. The experiments showed that, in contrast to many existing methods, the proposed BB method is highly insensitive to initial learning rate, especially in terms of generalization performance. Also, the BB method showed its advantages on both learning speed and generalization performance over other available methods.</style></abstract></record></records></xml>