MINDS 2021 Winter Symposium- Nicolas Loizou
Abstract– Stochastic gradient descent (SGD) is the workhorse for training modern large-scale supervised machine learning models. In this talk, we will discuss recent developments in the convergence analysis of SGD and propose efficient and practical variants for faster convergence. We will start by presenting a general yet simple theoretical analysis describing the convergence of SGD under the arbitrary sampling paradigm. The proposed analysis describes the convergence of an infinite array of variants of SGD, each of which is associated with a specific probability law governing the data selection rule used to form minibatches. The result holds under the weakest possible assumptions providing for the first time the best combination of step-size and optimal minibatch size. We will also present a novel adaptive (no-tuning needed) learning rate for SGD. We will introduce a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method and explain why the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for SGD. We will provide theoretical convergence guarantees for the new method in different settings, including strongly convex, convex, and non-convex functions, and demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models. Finally, we will close with a brief presentation of how standard optimization methods can also solve smooth games (min-max optimization problems) through the Hamiltonian viewpoint.