OpenAI Baselines: DQN
They are open-sourcing OpenAI Baselines, their internal effort to reproduce reinforcement learning algorithms with performance on par with published results. They will release the algorithms over upcoming months; today’s release includes DQN and three of its variants. Reinforcement learning results are tricky to reproduce: performance is very noisy, algorithms have many moving parts which allow for subtle bugs, and many papers don’t report all the required tricks. By releasing known-good implementations (and best practices for creating them), they would like to ensure that apparent RL advances never are due to comparison with buggy or untuned versions of existing algorithms. RL algorithms are challenging to implement correctly; good results typically only come after fixing many seemingly-trivial bugs. This post contains some best practices they use for correct RL algorithm implementations, as well as the details of our first release: DQN and three of its variants, algorithms developed by DeepMind. This will be added to Artificial Intelligence Resources Subject Tracer™.