Snake Q-Learning | Lab | Antony O'Neill

What this explores

Deep Q-Learning with experience replay. Neural network learns optimal actions through trial and error.

Principles demonstrated

Q-Learning algorithm with neural network
Epsilon-greedy exploration strategy
Experience replay for stable learning
Real-time training visualization

Build It Yourself

Step-by-step guide to recreate this experiment

Advanced60-90 min

Prerequisites

Python basics
Understanding of neural networks
Basic calculus (gradients)
PyTorch fundamentals

Q-Learning is a reinforcement learning algorithm that learns the value (Q) of taking an action in a given state. The "Q" stands for "quality" - how good is this action? The agent learns by trial and error, updating Q-values based on rewards received.

python

# The Bellman Equation - core of Q-Learning
# Q(s,a) = r + γ * max(Q(s',a'))
#
# Where:
# - Q(s,a) = value of taking action 'a' in state 's'
# - r = immediate reward
# - γ (gamma) = discount factor (0-1), values future rewards
# - s' = next state
# - max(Q(s',a')) = best possible value from next state

# Example: Snake eats food
# reward = +10
# gamma = 0.9
# max Q of next state = 5
# New Q = 10 + 0.9 * 5 = 14.5

Gamma (γ) controls how much the agent cares about future rewards. γ=0 means only immediate rewards matter, γ=1 means future rewards are equally important as immediate ones.

Resources

Deep Q-Learning Paper (Mnih et al.)PyTorch Documentation Bellman Equation Explained Epsilon-Greedy Strategy