Papers
1. Value-based Methods
a) Idea + NFQ:
b) DQN = Experienced Replay + Fixed Targets:
c) Dueling DQN + Prioritized Experience Replay
- Prioritized Experience Replay: https://arxiv.org/abs/1511.05952
- Dueling DQN: https://arxiv.org/abs/1511.06581
2. Policy-gradient Methods:
a) REINFORCE/monte-carlo policy gradient & baseline:
- Chapter 13.3 : https://s3-us-west-1.amazonaws.com/udacity-drlnd/bookdraft2018.pdf
- Article explaining the math: https://towardsdatascience.com/policy-gradients-in-a-nutshell-8b72f9743c5d
- Baselines: Chapter 13.4 : https://s3-us-west-1.amazonaws.com/udacity-drlnd/bookdraft2018.pdf
- Baselines: Article explaining the math: https://towardsdatascience.com/policy-gradients-in-a-nutshell-8b72f9743c5d
b) Idea ("Vanilla" Actor-Critics, including math)
- Chapter 13.5 : https://s3-us-west-1.amazonaws.com/udacity-drlnd/bookdraft2018.pdf
- article explaining maths: https://towardsdatascience.com/understanding-actor-critic-methods-931b97b6df3f
c) A3C + A2C
- A3C: https://arxiv.org/pdf/1602.01783
- A2C: https://towardsdatascience.com/understanding-actor-critic-methods-931b97b6df3f
3. DRL for Continuous Action Spaces
a) Deterministic Policy Gradients (DPG): http://proceedings.mlr.press/v32/silver14.pdf
b) Deep Deterministic Policy Gradients (DDPG) & Improvements: Twin-delayed DPG (TD3):
- https://arxiv.org/abs/1509.02971.pdf (DDPG)
- https://arxiv.org/pdf/1802.09477.pdf, https://spinningup.openai.com/en/latest/algorithms/td3.html#background (TD3)
c) Proximal Policy Optimization (PPO): https://arxiv.org/pdf/1707.06347.pdf
4. Improving Exploration
a) Entropy Regularization & Soft Actor-Critics (SAC):
- https://towardsdatascience.com/entropy-regularization-in-reinforcement-learning-a6fa6d7598df (Entropy Reg.)
- https://arxiv.org/pdf/1801.01290.pdf (SAC)
b) Count-based Exploration Bonuses:
- https://arxiv.org/abs/1606.01868 (Pseudo Counts)
- https://arxiv.org/pdf/1703.01310 (follow-up paper improving pseudo-counts)
c) Prediction-based Exploration Bonuses:
- https://arxiv.org/pdf/1703.01732 (Surprise-based intrinsic motivation)
- https://arxiv.org/pdf/1810.12894 (Random Network Distillation)
5. Special Topics / Model-Based DRL
a) Monte Carlo Tree Search
- Chapter 8.10 + 8.11 : https://s3-us-west-1.amazonaws.com/udacity-drlnd/bookdraft2018.pdf
- Explanations: https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa
https://towardsdatascience.com/monte-carlo-tree-search-in-reinforcement-learning-b97d3e743d0f
b) Alpha(Go)Zero
- Deep-Mind paper: https://www.nature.com/articles/nature24270.epdf?author_access_token=VJXbVjaSHxFoctQQ4p2k4tRgN0jAjWel9jnR3ZoTv0PVW4gB86EEpGqTRDtpIz-2rmo8-KG06gqVobU5NSCFeHILHcVFUeMsbvwS-lxjqQGg98faovwjxeTUgZAUMnRQ (https://arxiv.org/pdf/1712.01815.pdf)
- Deep Reinforcement Learning Hands-On, chapter 18 (we have that book, come talk to us!) https://drive.google.com/file/d/1MNS4bmF58MlDlGOxCpS7-IfAFE-hJt2B/view
c ) MuZero: https://arxiv.org/pdf/1911.08265.pdf