Topics & Resources


The proseminar topics are largely based on the RL book by Sutton & Barto.

    1. Multi-armed bandits & Markov Decision Processes (2.1, 3)

    2. Policy Iteration (4.1, 4.2, 4.3)

    3. Value Iteration (4.4)

    4. Monte Carlo Methods (5.1, 5.2, 5.3) and Temporal Difference Methods (6.1, 6.2)

    5. Q-learning and Sarsa (6.4 and 6.5)

The (bracketed notes) refer to specific sections in the book which should serve as a good starting point. You are more than welcome to use additional resources/materials for preparing your talk.


The seminar topic selection aims to cover some of the most fundamental model-free and model-based deep reinforcement learning algorithms and give some insight into one additional key topic in deep reinforcement learning, namely methods for improving exploration. The basis of these topics are mostly the respective publications and sometimes (specific sections) from the RL book by Sutton & Barto.

Transitioning to deep RL

    6. Function Approximation (9.1, 9.2, 9.3)

Model-free DRL algorithms

    7. Deep Q-learning (DQN)

    8. DQN improvements (pick 2 out of these 3): Double DQN (useful: RL book 6.7), Dueling DQN, Prioritized Experience Replay

    9. Policy Gradient Methods (13.1, 13.2, 13.3), Advantage Actor Critics (useful, but optional: control variates)

    10. Proximal Policy Optimization

Model-based DRL algorithms

    11. Model-based RL and Monte Carlo Tree Search (8.1, 8.10, 8.11)

    12. AlphaGo Zero

    13. PlaNet

    14. Dreamer (v2)


    15. Random Network Distillation

    16. Hindsight Experience Replay

    17. Adversarially Motivated Intrinsic Goals

    18. Adversarially Guided Actor Critic

    19. Go-Explore

Privacy Policy | Legal Notice
If you encounter technical problems, please contact the administrators.