Papers

1. Value-based Methods

a) Idea + NFQ:

b) DQN = Experienced Replay + Fixed Targets:

c) Dueling DQN + Prioritized Experience Replay

2. Policy-gradient Methods:

a) REINFORCE/monte-carlo policy gradient & baseline:

b) Idea ("Vanilla" Actor-Critics, including math)

c) A3C + A2C

 

3. DRL for Continuous Action Spaces

a) Deterministic Policy Gradients (DPG): http://proceedings.mlr.press/v32/silver14.pdf

b) Deep Deterministic Policy Gradients (DDPG) & Improvements: Twin-delayed DPG (TD3):

c) Proximal Policy Optimization (PPO): https://arxiv.org/pdf/1707.06347.pdf

4. Improving Exploration

a) Entropy Regularization & Soft Actor-Critics (SAC):

b) Count-based Exploration Bonuses:

c) Prediction-based Exploration Bonuses:

5. Special Topics / Model-Based DRL

a) Monte Carlo Tree Search

b) Alpha(Go)Zero

c ) MuZero: https://arxiv.org/pdf/1911.08265.pdf

 

Privacy Policy | Legal Notice
If you encounter technical problems, please contact the administrators.