diff --git a/.gitignore b/.gitignore index 956e3a6..699c1b7 100644 --- a/.gitignore +++ b/.gitignore @@ -28,3 +28,4 @@ /.DS_Store *.aux *.log +.DS_Store diff --git a/2023-02-rl/why-rl-exciting.html b/2023-02-rl/why-rl-exciting.html index 9eea1c1..266f460 100644 --- a/2023-02-rl/why-rl-exciting.html +++ b/2023-02-rl/why-rl-exciting.html @@ -23,8 +23,8 @@
Therefore @@ -902,8 +902,8 @@ y(t+\Delta t) &\approx y(t) + \frac{\Delta y}{\Delta t}\cdot \Delta t \\
]]
@@ -918,8 +918,8 @@ y(t+\Delta t) &\approx y(t) + \frac{\Delta y}{\Delta t}\cdot \Delta t \\
Questions?
diff --git a/2023-02-rl/why-rl-exciting.org b/2023-02-rl/why-rl-exciting.org index 1504316..f084c96 100644 --- a/2023-02-rl/why-rl-exciting.org +++ b/2023-02-rl/why-rl-exciting.org @@ -22,7 +22,7 @@ ** What is Reinforcement Learning? -#+REVEAL_HTML: +#+REVEAL_HTML: #+BEGIN_NOTES + How would you control a helicopter to perform this stunt? #+END_NOTES @@ -58,7 +58,7 @@ ** Example: maze with pitfalls -#+REVEAL_HTML: +#+REVEAL_HTML: #+BEGIN_NOTES + example problem - simple enough to derive a solution @@ -72,7 +72,7 @@ ** Example problem: Balance a pole -#+REVEAL_HTML: +#+REVEAL_HTML: - State: pole angle, angular momentum, cart position, velocity - Actions: force on cart to left or right - Reward: +1 for each time step that the pole is upright @@ -81,7 +81,7 @@ #+END_NOTES ** Example problem: Playing football -#+REVEAL_HTML: +#+REVEAL_HTML: - States: where am I? other players? ball? - Actions: turn, run, pass, shoot, tackle - Reward: 1 for win, 0 for draw, -1 for loss @@ -92,13 +92,13 @@ ** A Brief History of RL *** Where does the term "reinforcement" come from? -#+REVEAL_HTML: +#+REVEAL_HTML: #+BEGIN_NOTES + Pavlov introduced conditioning which says that experience of rewards /reinforces/ that action happening in the same situation next time #+END_NOTES *** TOBY (1951) - W. Grey Walter -#+REVEAL_HTML: +#+REVEAL_HTML: #+BEGIN_NOTES + By 1950, cybernetics theorised that behaviour was driven by /simple/ rules @@ -106,7 +106,7 @@ *** Bellman equation (1957) and dynamic programming -#+REVEAL_HTML: +#+REVEAL_HTML: #+BEGIN_NOTES + Recursive form became theoretical basis for RL @@ -219,7 +219,7 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel #+END_NOTES *** Andrew Ng and Pieter Abbeel's Helicopter (2004) -#+REVEAL_HTML: +#+REVEAL_HTML: #+BEGIN_NOTES + Key for this talk is how they learnt each stunt in /simulation/ @@ -235,7 +235,7 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel #+END_NOTES *** Atari DQN Google DeepMind (2016) - Start of DeepRL -#+REVEAL_HTML: +#+REVEAL_HTML: #+BEGIN_NOTES + prior to this - full access to internal state + this RL agent just sees pixel values @@ -254,13 +254,13 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel #+END_NOTES *** Sim to real: Quadruped robots -#+REVEAL_HTML: +#+REVEAL_HTML: #+BEGIN_NOTES + small problems in simulator lead to problems with real world performance + however potential for simulator issues to be overcome #+END_NOTES *** OpenAI Rubik's cube robot -#+REVEAL_HTML: +#+REVEAL_HTML: #+BEGIN_NOTES + training in simulation starts with deterministic simulation @@ -318,7 +318,7 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel There is a lot of data being collected already but it is not always openly accessible #+END_NOTES *** Consider the human element -#+REVEAL_HTML: +#+REVEAL_HTML: #+BEGIN_NOTES + simple rules yield complex behaviour + we shouldn't ignore this problem just because it is hard