Skip to content
Permalink
Browse files
upload pdfs
  • Loading branch information
James Brusey committed Jun 4, 2023
1 parent 41104b0 commit f4866347fad2bd095f4ffb62bc4edca4251c9853
Show file tree
Hide file tree
Showing 7 changed files with 35 additions and 5 deletions.
@@ -1,9 +1,6 @@
/dp.tex
/intro.tex
/mdp.tex
/monte.pdf
/mdp.pdf
/intro.pdf
/ltximg/
/dp.pdf
/_minted-dp/
2 dp.org
@@ -3,8 +3,6 @@
#+author: Prof. James Brusey
#+options: toc:nil h:2
#+startup: beamer
#+language: dot
#+latex_header: \usepackage{algorithmicx}
* Policy evaluation (prediction)
** Why study Dynamic Programming?
+ DP assumes perfect model and is computationally expensive
BIN +10.3 MB intro.pdf
Binary file not shown.
13 mdp.org
@@ -108,3 +108,16 @@ where $T=\infty$ or $\gamma = 1$ (but not both).

* Optimal policies and optimal value functions
** Optimal policies and value functions
+ Given some MDP, what is the best value we can achieve?
$$
v_*(s) = \max_\pi v_\pi (s),
$$
for all $s \in \mathcal{S}$
+ What is the best state action value achievable?
$$
q_*(s, a) = \max_\pi q_\pi (s, a),
$$
for all $s \in \mathcal{S}, a \in \mathcal{A}(s)$
** Exercise
+ Develop a recursive expression for $v_*(s)$ and $q_*(s,a)$ from what we know so far
+ Feel free to look at the book for help
BIN +178 KB mdp.pdf
Binary file not shown.
BIN +134 KB monte.pdf
Binary file not shown.
@@ -103,3 +103,25 @@ system("rsync rl-course.ics cogentee:public_html/rl-course.ics")
#+RESULTS:
: # Out[12]:
: : 0
#+BEGIN_SRC ipython :session ical :results output
for ix, row in slots.iterrows():
print(f"D{row.day}S{row.slot} {row.title}")
#+END_SRC

#+RESULTS:
#+begin_example
D1S1 Introduction to the course
D1S2 Intro to RL
D1S3 OpenAI Gym, Gymnasium
D1S4 Lab: Frozen-Lake play
D2S1 MDPs
D2S2 Dynamic Programming
D2S3 Lab: Solving Frozen-Lake
D2S4 Monte Carlo Methods
D3S1 Lab: Blackjack with MC
D3S2 Function Approximation
D3S3 DQN, SAC, PPO
D3S4 Lab: Breakout
D4S1 Lab: Demo day
D4S2 Demos and wrap up
#+end_example

0 comments on commit f486634

Please sign in to comment.