upload pdfs

aa3172 · Jun 4, 2023 · f4866347fad2bd095f4ffb62bc4edca4251c9853 · f486634
1 parent 41104b0
commit f4866347fad2bd095f4ffb62bc4edca4251c9853
Show file tree

Hide file tree

Showing 7 changed files with 35 additions and 5 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,9 +1,6 @@
 /dp.tex
 /intro.tex
 /mdp.tex
-/monte.pdf
-/mdp.pdf
-/intro.pdf
 /ltximg/
 /dp.pdf
 /_minted-dp/

diff --git a/dp.org b/dp.org
@@ -3,8 +3,6 @@
 #+author: Prof. James Brusey
 #+options: toc:nil h:2
 #+startup: beamer
-#+language: dot
-#+latex_header: \usepackage{algorithmicx}
 * Policy evaluation (prediction)
 ** Why study Dynamic Programming?
 + DP assumes perfect model and is computationally expensive

diff --git a/intro.pdf b/intro.pdf
diff --git a/mdp.org b/mdp.org
@@ -108,3 +108,16 @@ where $T=\infty$ or $\gamma = 1$ (but not both).
 
 * Optimal policies and optimal value functions
 ** Optimal policies and value functions
++ Given some MDP, what is the best value we can achieve?
+  $$
+  v_*(s) = \max_\pi v_\pi (s),
+  $$
+  for all $s \in \mathcal{S}$
++ What is the best state action value achievable?
+  $$
+  q_*(s, a) = \max_\pi q_\pi (s, a),
+  $$
+  for all $s \in \mathcal{S}, a \in \mathcal{A}(s)$
+** Exercise
++ Develop a recursive expression for $v_*(s)$ and $q_*(s,a)$ from what we know so far
++ Feel free to look at the book for help
diff --git a/mdp.pdf b/mdp.pdf
diff --git a/monte.pdf b/monte.pdf
diff --git a/timetable.org b/timetable.org
@@ -103,3 +103,25 @@ system("rsync rl-course.ics cogentee:public_html/rl-course.ics")
 #+RESULTS:
 : # Out[12]:
 : : 0
+#+BEGIN_SRC ipython :session ical :results output
+for ix, row in slots.iterrows():
+    print(f"D{row.day}S{row.slot} {row.title}")
+#+END_SRC
+
+#+RESULTS:
+#+begin_example
+D1S1 Introduction to the course
+D1S2 Intro to RL
+D1S3 OpenAI Gym, Gymnasium
+D1S4 Lab: Frozen-Lake play
+D2S1 MDPs
+D2S2 Dynamic Programming
+D2S3 Lab: Solving Frozen-Lake
+D2S4 Monte Carlo Methods
+D3S1 Lab: Blackjack with MC
+D3S2 Function Approximation
+D3S3 DQN, SAC, PPO
+D3S4 Lab: Breakout
+D4S1 Lab: Demo day
+D4S2 Demos and wrap up
+#+end_example