Improving PILCO with Bayesian Neural Network Dynamics Models

Improving PILCO with Bayesian Neural Network Dynamics Models

Abstract

Model-based reinforcement learning (RL) allows an agent to discover good policies with a small number of trials by generalising observed transitions. Data efficiency can be further improved with a probabilistic model of the agent’s ignorance about the world, allowing it to choose actions under uncertainty. Bayesian modelling offers tools for this task, with PILCO being a prominent example, achieving state-of-theart data efficiency on low dimensional RL benchmarks. But PILCO relies on Gaussian processes (GPs), which prohibits its applicability to problems that require a larger number of trials to be solved. Further, PILCO does not consider temporal correlation in model uncertainty between successive state transitions, which results in PILCO underestimating state uncertainty at future time steps. In this paper we extend PILCO’s framework to use Bayesian deep dynamics models with approximate variational inference, allowing PILCO to scale linearly with number of trials and observation space dimensionality. Using particle methods we sample dynamics function realisations, and obtain lower cumulative cost than PILCO. We give insights into the modelling assumptions made in PILCO, and show that moment matching is a crucial simplifying assumption made by the model. Our implementation can leverage GPU architectures, offering faster running time than PILCO, and will allow structured observation spaces to be modelled (images or higher dimensional inputs) in the future.

Publication
In Data-Efficient Machine Learning workshop, International Conference on Machine Learning