TD Learning:
δ = R + γV(s') - V(s)
V(s) ← V(s) + α·δ
Schultz (1997) found that
DA neurons fire to:
• Unexpected reward
• Reward-predicting cue
(not reward, once learned)
Trial: 0
Last RPE: —
V(cue): —
V(reward): —
δ = R + γV(s') - V(s)
V(s) ← V(s) + α·δ
Schultz (1997) found that
DA neurons fire to:
• Unexpected reward
• Reward-predicting cue
(not reward, once learned)
Trial: 0
Last RPE: —
V(cue): —
V(reward): —