DDA-4230: Reinforcement Learning

Course Introduction

This course provides a basic introduction to reinforcement learning algorithms and their applications. Topics include:


  1. Assignments (written and coding homework) (30 points).
  2. Midterm exam (20 points).
  3. Final project (50 points).

For the detailed scoring scheme, please check the project introduction below.

Course Arrangement

  1. Lectures.
    • Time: Monday and Wednesday, 3:30 PM - 4:50 PM.
    • Classroom: Bldg 204, Teaching A Building.
  2. Tutorials.
    • Time: Monday, 8:00 PM -8:50 PM
    • Classroom: Bldg 204, Teaching A Building.
  3. Office Hours.
    • Guiliang Liu (Instructor): Monday, 5:00 PM - 6:00 PM, Bldg 204, Teaching A Building.
    • Xu Sheng (TA): Wednesday 7:00 PM -8:00 PM, Room 326, Daoyuan Building.

Important Notes


Some news will be added to here at the student′s request.


  1. Late Policy. A late submission should receive a 10% penalty for each date after the due. Note that the penalty can accumulate until it reaches 100% (late for 10 days). If you need special care (e.g., for surgery and other health problem), DO NOT wait until the last moment, and please let me know in advance (see my contact below).
  2. Late Drop. A late drop from the course is not encouraged. Under special circumstances, students may apply for a late drop, but there is no guarantee that the request can be approved by the school office.
  3. Honesty in Academic Work. The Chinese University of Hong Kong, Shenzhen places very high importance on honesty in academic work submitted by students, and adopts a policy of zero tolerance on academic dishonesty. While academic dishonesty is the overall name, there are several sub-categories can be found at here.

Course syllabus and Timetable

Topics covered will include the following (The instructor will consistently upload slides and the timeline might be changed at the needs from students)):

  1. Week 1 (Sept. 6th) Lecture 0: [Slides].
  2. Week 1 (Sept. 9th) Lecture 1: Markov decision process [Slides] [Notes].
  3. Week 2 (Sept. 11th) Lecture 2: Optimality of MDPs [Slides] [Notes].
  4. Week 2 (Sept. 13th) Lecture 3: Stochastic multi-armed bandits [Slides] [Notes].
  5. Week 3 (Sept. 18th) Lecture 4: Greedy algorithms [Slides] [Notes].
  6. Week 3 (Sept. 20th) Lecture 5: Explore-then-commit algorithms [Slides] [Notes].
  7. Week 3 (Sept. 20th) Lecture 6: UCB algorithms [Slides] [Notes].
  8. Week 4 (Sept. 25th) Lecture 7: Thompson sampling [Slides] [Notes].
  9. Week 4 (Sept. 25th) Lecture 8: Hardness of Bandits [Slides] [Notes].
  10. Week 4 (Sept. 27th) Lecture 9: Discrete MDPs [Slides] [Notes].
  11. Week 5 (Otc. 9th) Lecture 10: Iterative Methods [Slides] [Notes].
  12. Week 6 (Otc. 11th) Lecture 11: UCVI and PSRL [Slides] [Notes].
  13. Week 6 (Otc. 11th) Lecture 12: Q-Learning [Slides] [Notes].
  14. Week 7 (Otc. 16th) Lecture 13: Model-Free Policy Evaluation [Slides] [Notes].
  15. Week 7 (Otc. 18th) Lecture 14: Trial and Error [Slides] [Notes].
  16. Week 8 (Otc. 23th) Lecture 15: Value function Approximation [Slides] [Notes].
  17. Week 8 (Otc. 25th) Lecture 16: Deep Q-learning [Slides] [Notes].
  18. Week 9 (Nov. 1st) Lecture 17: Policy Gradient [Slides] [Notes].
  19. Week 11 (Nov. 13th) Lecture 18: Policy Optimization [Slides] [Notes].
  20. Week 11 (Nov. 15th) Lecture 19: Interconnections between policy and value [Slides] [Notes].
  21. Week 12 (Nov. 20th) Lecture 20: Imitation Learning [Slides] [Notes].
  22. Week 13 (Nov. 27th) Lecture 21: Monte Carlo Tree Search [Slides].
  23. Week 14 (Dec. 04th) Lecture 22: Reinforcement Learning from Human Feedback [Slides].
Acknowledgement: The teaching materials use resource from [Previous Course].


The submission should be made through the BlackBoard System.
  1. [Assignment 1].(Due 23:59 PM Oct. 9, 2023.)
  2. [Assignment 2].(Due 23:59 PM Oct. 30, 2023.)
  3. [Assignment 3].(Due 23:59 PM Nov. 21, 2023.)
  4. [Assignment 4].(Due 23:59 PM Dec. 15, 2023.)

Midterm Exam

[Midterm Exam Solution]

Final Project

(Due 23:59 PM Dec. 25, 2023.)

The submission should be made through the BlackBoard System.

  1. [Format Template in Latex].
  2. [How to Write a Literature Review].
  3. [Meeting Registeration]
  4. [Final Project Marking Standard]