Skip to content
Permalink
main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time

Reinforcement Learning

Function approximation

Function approximation

  • What if we don’t have a finite (or at least small) number of states?
  • We’ve also noticed that nearby states often have close values
  • We could make a fine mesh over possible states
  • We could linearly interpolate
  • We could use a generic function approximator

Function approximator

  • For some set of weights $\vec{w} ∈ \mathbb{R}^d$, we write $$ \hat{v}(s, \vec{w}) ≈ v_π (s) $$
  • Our objective is to minimise the error $$ \overline{\mathrm{VE}}(\vec{w}) \doteq ∑s ∈ \mathcal{S} μ(s) \left[v_π(s) - \hat{v}(s,\vec{w})\right]^2 $$
  • Note that this is a weighted mean with weights $μ(s)$

Stochastic Gradient Descent

  • SGD moves the weight a small amount in the direction of the negative of the gradient $$ \vec{w}t+1 \doteq \vec{w}_t - \frac{1}{2}α ∇ \left[ v_π (S_t) - \hat{v}(S_t, \vec{w}_t)\right]^2 $$ $$ = \vec{w}_t + α \left[ v_π (S_t) - \hat{v} (S_t, \vec{w}_t)\right] ∇ \hat{v} (S_t, \vec{w}_t) $$
  • Note that $∇ f(\vec{w})$ means the vector of partial derivatives $$ ∇ f(\vec{w}) \doteq \left( \frac{ ∂ f(\vec{w})}{∂ w_1},\frac{ ∂ f(\vec{w})}{∂ w_2},\cdots,\frac{ ∂ f(\vec{w})}{∂ w_d} \right)^\top $$

Linear Function Approximator

  • One simple approximator is $$ \hat{v}(s,\vec{w}) \doteq \vec{w}^\top \vec{x}(s), $$ where $\vec{x}(s)$ is $s$ expressed as a feature vector
  • The gradient is then simply $$ ∇ \hat{v}(s, \vec{w}) = \vec{x}(s). $$
  • When we use a linear function approximator, we refer to the algorithm as linear
    • e.g., Linear Sarsa, Linear TD

The problem of discontinuities

  • Consider the game of chess
  • Two positions may be very similar (e.g., only one piece different)
  • However, one position may be leading to checkmate (win) whereas the other may be a loss
  • This is a sharp discontinuity
  • For this reason, we want a complex function approximator

The problem of sparseness

  • In principle, we can converge on the value of a state action if our search is guaranteed to visit that state-action an infinite number of times
  • However, as the state-action space grows, it becomes hard to visit every instance
  • Thus we need a smooth and simple function approximator

Non-linear function approximators

  • Linear approximators are simple and convergence proofs are possible
  • Non-linear might seem better, especially when there is difficulty designing $\vec{x}(s)$
  • Artificial Neural Networks might be used
  • Problem is that samples are not independent from prior samples
  • Solution is to use Experience Replay (used by Deep Q-Network (DQN))