diff --git a/2023-02-rl/why-rl-exciting.html b/2023-02-rl/why-rl-exciting.html index 0bc0926..2c4e4d9 100644 --- a/2023-02-rl/why-rl-exciting.html +++ b/2023-02-rl/why-rl-exciting.html @@ -23,8 +23,8 @@

James Brusey

-
-

Overview

+
+

Overview

  • What is Reinforcement Learning?
      @@ -38,10 +38,10 @@
-
-

What is Reinforcement Learning?

+
+

What is Reinforcement Learning?

-
+

helicopter_tail_rotor_thrust_antitorque_compensation.jpeg

@@ -57,8 +57,8 @@
-
-

What is Reinforcement Learning?

+
+

What is Reinforcement Learning?

-
-

What is Reinforcement Learning?

+
+

What is Reinforcement Learning?

-
+

RLvsML.jpeg

@@ -90,8 +90,8 @@
-
-

Some definitions

+
+

Some definitions

  • policy—how an agent behaves
      @@ -138,8 +138,8 @@ So let's summarise the key aspects of RL
-
-

Example: maze with pitfalls

+
+

Example: maze with pitfalls

-
-

Example problem: Balance a pole

+
+

Example problem: Balance a pole

  • State: pole angle, angular momentum, cart position, velocity
  • @@ -180,8 +180,8 @@ So let's summarise the key aspects of RL
-
-

Example problem: Playing football

+
+

Example problem: Playing football

  • States: where am I? other players? ball?
  • @@ -200,13 +200,13 @@ So let's summarise the key aspects of RL
-
-

A Brief History of RL

-
+
+

A Brief History of RL

+
-
-

Where does the term "reinforcement" come from?

+
+

Where does the term "reinforcement" come from?

-
-

TOBY (1951) - W. Grey Walter

+
+

TOBY (1951) - W. Grey Walter

-
-

Bellman equation (1957) and dynamic programming

+
+

Bellman equation (1957) and dynamic programming

-
-

Barto, Sutton and Anderson: Actor Critic (1983)

+
+

Barto, Sutton and Anderson: Actor Critic (1983)

-
+

figtmp34.png

-
+

sutton-head5.jpg

-
+

barto_andrew_crop.jpeg

-
+

Charles-Anderson.jpg

@@ -296,10 +296,10 @@ So let's summarise the key aspects of RL
-
-

Watkins Q-learning (1989)

+
+

Watkins Q-learning (1989)

-
+

cw090311.jpg

@@ -322,10 +322,10 @@ Q^{new}(s_{t},a_{t}) \leftarrow \underbrace{Q(s_{t},a_{t})}_{\text{old}} + \unde
-
-

Tesauro's TD Gammon (1992)

+
+

Tesauro's TD Gammon (1992)

-
+

td-gammon.png

@@ -341,10 +341,10 @@ Q^{new}(s_{t},a_{t}) \leftarrow \underbrace{Q(s_{t},a_{t})}_{\text{old}} + \unde
-
-

RL parallels in Neuroscience (1994-)

+
+

RL parallels in Neuroscience (1994-)

-
+

dopamine.png

@@ -363,10 +363,10 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel
-
-

My PhD work - RoboCup

+
+

My PhD work - RoboCup

-
+

socbot1.png

@@ -378,10 +378,10 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel
-
-

Move to point (hand coded)

+
+

Move to point (hand coded)

-
+

phys-hc-1.png

@@ -395,10 +395,10 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel
-
-

Move to point (RL)

+
+

Move to point (RL)

-
+

phys-mcsoft-1.png

@@ -412,10 +412,10 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel
-
-

Ball dribbling

+
+

Ball dribbling

-
+

sym2-0.png

@@ -428,10 +428,10 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel
-
-

Ball dribbling (hand coded)

+
+

Ball dribbling (hand coded)

-
+

t61.2.png

@@ -443,10 +443,10 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel
-
-

Ball dribbling (RL)

+
+

Ball dribbling (RL)

-
+

t62.12.png

@@ -459,8 +459,8 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel
-
-

Andrew Ng and Pieter Abbeel's Helicopter (2004)

+
+

Andrew Ng and Pieter Abbeel's Helicopter (2004)

-
-

Atari DQN Google DeepMind (2016) - Start of DeepRL

+
+

Atari DQN Google DeepMind (2016) - Start of DeepRL

-
-

AlphaGo and AlphaZero (Google DeepMind 2016)

+
+

AlphaGo and AlphaZero (Google DeepMind 2016)

-
+

alphago.png

@@ -517,8 +517,8 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel
-
-

Sim to real: Quadruped robots

+
+

Sim to real: Quadruped robots

-
-

OpenAI Rubik's cube robot

+
+

OpenAI Rubik's cube robot

-
-

Champion level drone racing using Deep RL

- +
+

Learning to walk in 1 hour (Dreamer v3)

+ +
-
-

width="560" height="315"

+
+

Champion level drone racing using Deep RL (Oct 23)

+ + + +
-
-

Key challenges for RL for real-world problems

+
+

Key challenges for RL for real-world problems

  • Common framework
  • Resolve the environment problem
  • @@ -562,8 +581,8 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel
-
-

Common framework

+
+

Common framework

  • RL is based on a well-structured problem formulation
      @@ -587,8 +606,8 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel
-
-

Resolve the environment problem

+
+

Resolve the environment problem

  • Simple environments are easy - results are fast
  • Bugs in the simulator can lead to poor control behaviour
  • @@ -610,8 +629,8 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel
-
-

Collect open data

+
+

Collect open data

  • Simulating environments from first principles tends to miss key characteristics
      @@ -630,8 +649,8 @@ There is a lot of data being collected already but it is not always openly acces
-
-

Consider the human element

+
+

Consider the human element

-
-

RL applied to electric vehicle comfort control

+
+

RL applied to electric vehicle comfort control

-
+

car-air-conditioning-service.jpeg

@@ -668,10 +687,10 @@ There is a lot of data being collected already but it is not always openly acces
-
-

EV range issue

+
+

EV range issue

-
+

46-51_Cabin-Conditioning_atrApr19_1.jpeg

@@ -685,10 +704,10 @@ There is a lot of data being collected already but it is not always openly acces
-
-

Seat heating

+
+

Seat heating

-
+

heated-seats-button.jpeg

@@ -704,10 +723,10 @@ There is a lot of data being collected already but it is not always openly acces
-
-

Natural ventilation

+
+

Natural ventilation

-
+

Coventry_University_Lanchester_Library_6933825422.jpeg

@@ -722,10 +741,10 @@ There is a lot of data being collected already but it is not always openly acces
-
-

I've been working on it a while

+
+

I've been working on it a while

-
+

DSCF0052.jpg

@@ -737,10 +756,10 @@ There is a lot of data being collected already but it is not always openly acces
-
-

H2020 EU Project - DOMUS

+
+

H2020 EU Project - DOMUS

-
+

domus-partners.jpg

@@ -752,10 +771,10 @@ There is a lot of data being collected already but it is not always openly acces
-
-

Climate control as an RL problem

+
+

Climate control as an RL problem

-
+

comfort-problem.png

@@ -770,8 +789,8 @@ There is a lot of data being collected already but it is not always openly acces
-
-

Producing a fast thermal cabin model

+
+

Producing a fast thermal cabin model

  • Let's focus on one aspect - the thermal cabin model
  • Past work suggests that learning a comfort controller requires about 8 years of simulated experience
  • @@ -787,10 +806,10 @@ There is a lot of data being collected already but it is not always openly acces
-
-

Gathering data from the Climatic Wind Tunnel

+
+

Gathering data from the Climatic Wind Tunnel

-
+

cwt.png

@@ -804,8 +823,8 @@ There is a lot of data being collected already but it is not always openly acces
-
-

Accelerating the cabin model

+
+

Accelerating the cabin model

  • Key idea: it's possible to learn the cabin model from data \[ \mathbf{x}_{t+1} \approx \mathbf{f}_\theta \left( \mathbf{x}_t, \mathbf{u}_t, \mathbf{x}_{t-1},\ldots \right) \] @@ -828,8 +847,8 @@ where
-
-

Intuition for cabin model

+
+

Intuition for cabin model

  • Lumped thermal model is based on Newton's law of cooling \[ \frac{dy}{dt} = -k(y-y_0) \]
  • @@ -852,8 +871,8 @@ where
-
-

Intuition for cabin model

+
+

Intuition for cabin model

  • Therefore @@ -883,8 +902,8 @@ y(t+\Delta t) &\approx y(t) + \frac{\Delta y}{\Delta t}\cdot \Delta t \\

-
-

Simulator results - driver foot, torso, head

+
+

Simulator results - driver foot, torso, head

cwt-driver-head-foot.png]]

@@ -899,8 +918,8 @@ y(t+\Delta t) &\approx y(t) + \frac{\Delta y}{\Delta t}\cdot \Delta t \\
-
-

Results from this simulator

+
+

Results from this simulator

  • Linear Regression-based model NRMSE 1.8% overall
      @@ -924,10 +943,10 @@ y(t+\Delta t) &\approx y(t) + \frac{\Delta y}{\Delta t}\cdot \Delta t \\
-
-

Preliminary results using RL

+
+

Preliminary results using RL

-
+

energyweight.png

@@ -943,8 +962,8 @@ y(t+\Delta t) &\approx y(t) + \frac{\Delta y}{\Delta t}\cdot \Delta t \\
-
-

Conclusions

+
+

Conclusions

  • RL is a very active and exciting domain
  • Surprisingly it has made little inroads into real-world systems
  • @@ -963,8 +982,8 @@ Focus on optimality
-
-

Thank you

+
+

Thank you

Questions?

diff --git a/2023-02-rl/why-rl-exciting.org b/2023-02-rl/why-rl-exciting.org index ef098ee..0de5b4c 100644 --- a/2023-02-rl/why-rl-exciting.org +++ b/2023-02-rl/why-rl-exciting.org @@ -265,8 +265,20 @@ The dopamine response coding an error in the prediction of reward (Eq. 1) closel + progressively adds more randomisation during learning so that when transferred to real robot, behaviour is more robust #+END_NOTES -*** Champion level drone racing using Deep RL -#+REVEAL_HTML: +*** Learning to walk in 1 hour (Dreamer v3) +#+REVEAL_HTML: +#+BEGIN_NOTES ++ why isn't the walking behaviour better? ++ look at how well it recovers from being knocked over +#+END_NOTES + +*** Champion level drone racing using Deep RL (Oct 23) +#+REVEAL_HTML: + +#+BEGIN_NOTES ++ This is amazing because it solves the problem of sim2real for a difficult problem ++ There are still a lot of technical challenges here to do with state estimation +#+END_NOTES ** Key challenges for RL for real-world problems - Common framework diff --git a/2024-01-ctpsr-ai/ctpsr-ai.org b/2024-01-ctpsr-ai/ctpsr-ai.org new file mode 100644 index 0000000..6b71075 --- /dev/null +++ b/2024-01-ctpsr-ai/ctpsr-ai.org @@ -0,0 +1,62 @@ +#+title: Rough intro to AI +#+date: 11 January 2024 +#+property: header-args:ipython :session session1 :results output raw drawer :exports both +#+options: toc:nil H:1 +#+startup: beamer +#+latex_class: beamer +#+latex_class_options: +#+beamer_theme: Boadilla +#+latex_header: \usepackage{natbib} +#+description: +#+keywords: +#+subtitle: +#+latex_compiler: pdflatex + +* Some talking points ++ Understanding of the main AI tools currently available and what might change over the next few months ++ Current uses in research ++ Opportunities and risks ++ Ethical considerations ++ CU capabilities and how these are evolving + +* What AI thinks of AI + +#+attr_latex: :height 0.8\textheight +[[file:figures/DALL-E 2024-01-11 10.54.12 - An educational illustration showcasing different types of artificial intelligence. The image is divided into several sections, each representing a dif.png]] + +* Overview of AI ++ Machine learning + + Supervised + + Take a picture and recognise a digit, dog, tank, \ldots + + Generative + + Unsupervised + + Reinforcement Learning + +* Importance of Openness for LLMs ++ Technology for OpenAI GPT-4 is proprietary (as are Bard / Claude) ++ Open source systems (e.g., Mistral) getting better though ++ Need to avoid lock-in ++ Need to know what is inside + + +* Similarities between LLMs and the Human Brain + - Physical and Functional Differences + \note{Acknowledge that the physical hardware and many mechanisms of neural networks are distinct from the human brain.} + - Large-Scale Complexity + \note{Emphasize how, at a macro level, the complexities and capabilities of Large Language Models (LLMs) can appear remarkably similar to certain functions of the human brain.} + - Pattern Recognition and Learning + \note{Highlight the similarities in how both LLMs and the human brain learn from vast amounts of data and recognize patterns.} + - Limitations in Comparison + \note{Caution against overstating the comparison, as the human brain's workings are vastly more complex and less understood.} + +* Current Limitations and Future Possibilities of LLMs + - Lack of Memory and Contextual Understanding + \note{Explain how LLMs, unlike the human brain, do not possess real memory but use context to create an illusion of continuity and understanding.} + - Output Restrictions + \note{Note that LLMs are currently limited to text output, lacking the ability to perform actions or interact with the environment.} + - No Embodiment or Sensory Perception + \note{Highlight the absence of a physical or sensory presence in LLMs, limiting their understanding of the real world.} + - Absence of Emotional Intelligence + \note{Discuss the lack of emotional capacity in LLMs, differentiating them significantly from human cognitive and emotional processes.} + - Potential for Future Advancements + \note{Speculate on the future evolution of AI, suggesting that current limitations like memory, embodiment, sensory perception, and emotional intelligence might be overcome in the next 20 years, leading to more advanced and human-like AI capabilities.} diff --git a/2024-01-ctpsr-ai/ctpsr-ai.pdf b/2024-01-ctpsr-ai/ctpsr-ai.pdf new file mode 100644 index 0000000..ebf84bf Binary files /dev/null and b/2024-01-ctpsr-ai/ctpsr-ai.pdf differ diff --git a/2024-01-ctpsr-ai/figures/DALL-E 2024-01-11 10.54.12 - An educational illustration showcasing different types of artificial intelligence. The image is divided into several sections, each representing a dif.png b/2024-01-ctpsr-ai/figures/DALL-E 2024-01-11 10.54.12 - An educational illustration showcasing different types of artificial intelligence. The image is divided into several sections, each representing a dif.png new file mode 100644 index 0000000..0660fb9 Binary files /dev/null and b/2024-01-ctpsr-ai/figures/DALL-E 2024-01-11 10.54.12 - An educational illustration showcasing different types of artificial intelligence. The image is divided into several sections, each representing a dif.png differ