Activity: Talk or presentation › Talk at conference or symposium › Science to public
In this work, we adapt the standard experience replay approach for the task of learning multiple similar environments. Particularly for our task, we consider a robot learning decision making for multiple corridor environments. While using single monocular images for the observation states, the agent learns to predict the reward that is related to the maximum traveled distance that can be reached for the given action. Based on the reward predictions, the agent than decides which action to take.
9 May 2018
BMVA Symposium on Reinforcement Learning in Computer Vision