Conditional Mean Embeddings (CME) provide a way of learning to estimate expectations under unknown distributions. We consider their application to learning the system dynamics for Markov Decision Processes (MDPs). This results in a model-based approach to their solution that reduces the planning problem to a finite (pseudo-) MDP exactly solvable by dynamic programming. Unfortunately, the size of the finite MDP scales badly with the amount of experience. By approximating the loss function of the CME the size of the induced (pseudo-) MDP can be compressed while maintaining performance guarantees. At the same time, the CME model can itself be approximated using a fast sparse-greedy kernel regression. The performance of the composite method compares favourably with the state-of-the-art methods both in accuracy and efficiency. Extensions of the approach to deep learning will also be presented.

John Shawe-Taylor, University College London