DDA-4230: Reinforcement Learning
Course Introduction
This course provides a basic introduction to reinforcement learning algorithms and their applications.
Topics include:
- Multi-armed bandits; finite Markov decision processes; dynamic programming; Monte-Carlo methods;
temporal-difference learning; actor-critic methods; off-policy learning.
- Introduction to deep variants of the aforementioned algorithms, including deep Q-learning,
policy gradient methods, and actor-critic methods.
Scoring:
- Assignments (written and coding homework) (30 points).
- Midterm exam (20 points).
- Final project (50 points).
For the detailed scoring scheme, please check the project introduction below.
Course Arrangement
- Lectures.
- Time: Monday and Wednesday, 1:30PM - 2:50PM.
- Classroom: Room 302, Teaching Complex C.
- Tutorials.
- Time: Tuesday, 20:00 PM -20:50 PM
- Classroom: Bldg 206, Teaching C Building.
- Office Hours.
- Guiliang Liu (Instructor): Monday, 2:50 PM - 3:50 PM, Room 302, Teaching Complex C.
- Bo Yue,Hengming Zhang (TA): Friday 5:00-6:00, Room 611, Teaching Complex B (TXB).
Important Notes
News.
Some news will be added to here at the student′s request.
Polices.
- Late Policy. A late submission should receive a 10% penalty for each date after the due.
Note that the penalty can accumulate until it reaches 100% (late for 10 days).
If you need special care (e.g., for surgery and other health problem),
DO NOT wait until the last moment, and please let me know in advance (see my contact below).
- Late Drop. A late drop from the course is not encouraged.
Under special circumstances, students may apply for a late drop,
but there is no guarantee that the request can be approved by the school office.
- Honesty in Academic Work.
The Chinese University of Hong Kong, Shenzhen places very high importance on honesty in academic
work submitted by students, and adopts a policy of zero tolerance on academic dishonesty.
While academic dishonesty is the overall name, there are several sub-categories can be found at here.
Course syllabus and Timetable
Topics covered will include the following (The instructor will consistently upload slides and the timeline might be changed at the needs from students)):
- Week 1 (Sept. 1st)
Lecture 0: Course Introduction [Slides].
- Week 1 (Sept. 3rd)
Lecture 1: Markov decision process [Slides] [Notes].
- Week 2 (Sept. 8th)
Lecture 2: Optimality of MDPs [Slides] [Notes].
- Week 2 (Sept. 10th)
Lecture 3: Stochastic multi-armed bandits [Slides] [Notes].
- Week 3 (Sept. 17th)
Lecture 4: Greedy algorithms [Slides] [Notes].
- Week 4 (Sept. 22th)
Lecture 5: Explore-then-commit algorithms [Slides] [Notes].
- Week 4 (Sept. 26th)
Lecture 6: UCB algorithms [Slides] [Notes].
- Week 4 (Sept. 28th)
Lecture 7: Thompson sampling [Slides] [Notes].
- Week 4 (Sept. 28th)
Lecture 8: Hardness of Bandits [Slides] [Notes].
- Week 5 (Sept. 29th)
Lecture 9: Iterative Methods [Slides] [Notes].
- Week 5 (Otc. 13th)
Lecture 10: UCVI and PSRL [Slides] [Notes].
- Week 6 (Otc. 15th)
Lecture 11: Q-Learning [Slides] [Notes].
- Week 7 (Otc. 20th)
Lecture 12: Model-Free Policy Evaluation [Slides] [Notes].
- Week 7 (Otc. 22th)
Lecture 13: Advanced Topic: Monte-Carlo Tree Search [Slides].
- Week 8 (Otc. 27th)
Lecture 14: Trial and Error [Slides] [Notes].
- Week 9 (Nov. 10th)
Lecture 15: Value function Approximation [Slides] [Notes].
- Week 9 (Nov. 12th)
Lecture 16: Deep Q-learning [Slides] [Notes].
- Week 10 (Nov. 16th)
Lecture 17: Policy Gradient [Slides] [Notes].
- Week 10 (Nov. 19th)
Lecture 18: Policy Optimization [Slides] [Notes].
- Week 11 (Nov. 24th)
Lecture 19: Interconnections between policy and value [Slides] [Notes].
- Week 11 (Nov. 26th)
Lecture 21: Reinforcement Learning from Human Feedback [Slides].
- Week 12 (Dec. 1st)
Lecture 22: Reinforcement Learning in Embodied AI [Slides].
Acknowledgement: The teaching materials use resource from
[Previous Course].
Course Survey
Please fill in the survey so that we understand your concern.
[Survey Link]
About Reference Letter
I am happy to provide reference letters for students in my class.
However, please keep the following guidelines in mind before approaching me:
- I will write letters only for students who have earned an A or A- (around top 40%) in my class.
- I will write a letter for a student who may not meet the above grade criteria but has demonstrated strong academic engagement, for example, by regularly attending class, actively participating in discussions, and allowing me to get to know them well.
For other students, I generally do not recommend requesting a letter from me,
as I may not be able to provide any positive comments that would strengthen your application.
In any cases, if you want me you provide you a letter, please put the code
[IHaveReadYourMsg] in the email title.
I can not reply any email without this code.