Symposium Fall 2020 – MINDS Plenary Sham Kakade

/ October 8, 2020/

October 23, 2020 @ 1:30 pm – 2:10 pm

Title – What are the statistical limits of offline reinforcement learning with function approximation?

Abstract – The area of offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. It is becoming increasingly important to numerous areas of science, engineering, and technology. Here, the hope is that function approximation methods (to deal with the curse of dimensionality) coupled with offline reinforcement learning strategies can provide a means to help alleviate the excessive sample complexity burden in modern sequential decision making problems. However, the extent to which this broader approach can be effective is not well understood, where the literature largely consists of sufficient conditions.

This talk will focus on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning? Perhaps surprisingly, our main result shows even if: i) we have realizability in that the true value function of _all_ target policies are linear in a given set of features and 2) our off-policy data has good  coverage over all these features (in a precisely defined and strong sense), any algorithm information-theoretically still requires an exponential number of offline samples to non-trivially estimate the value of the _any_ target policy. Our results highlight that sample-efficient, offline RL is simply not possible unless significantly stronger conditions hold:  such as  either having extremely low distribution shift (where the offline data distribution is close to the distribution of the target) or far stronger representational conditions must hold (beyond realizability).

Time permitting, we will also discuss representation learning strategies to help.

This is joint work with Ruosong Wang and Dean Foster.

Bio -Sham Kakade is a professor in the Department of Computer Science and the Department of Statistics at the University of Washington and  is also a senior principal researcher at Microsoft Research.  His works is on the mathematical foundations of machine learning and AI. Sham’s thesis helped in laying the statistical foundations of reinforcement learning. With his collaborators, his additional contributions include:  one of the first provably efficient policy search methods in reinforcement learning; developing the mathematical foundations for the widely used linear bandit models and the Gaussian process bandit models; the tensor and spectral methodologies for provable estimation of latent variable models; the first sharp analysis of the perturbed gradient descent algorithm, along with the design and analysis of numerous other convex and non-convex algorithms.  He is the recipient of the ICML Test of Time Award, the IBM Pat Goldberg best paper award, and INFORMS Revenue Management and Pricing Prize. He has been program chair for COLT 2011.

Sham was an undergraduate at Caltech, where he studied physics and worked under the guidance of John Preskill in quantum computing. He then completed his Ph.D. Peter Dayan in computational neuroscience at the Gatsby Unit at University College London. He was a postdoc with Michael Kearns at the University of Pennsylvania. Sham has been a Principal Research Scientist at Microsoft Research, New England, an associate professor at the Department of Statistics, Wharton, UPenn, and an assistant professor at the Toyota Technological Institute at Chicago.

Share this Post