What this explores
Deep Q-Learning with experience replay. Neural network learns optimal actions through trial and error.
Principles demonstrated
- Q-Learning algorithm with neural network
- Epsilon-greedy exploration strategy
- Experience replay for stable learning
- Real-time training visualization
Build It Yourself
Step-by-step guide to recreate this experiment
Advanced60-90 min
Prerequisites
- Python basics
- Understanding of neural networks
- Basic calculus (gradients)
- PyTorch fundamentals
Q-Learning is a reinforcement learning algorithm that learns the value (Q) of taking an action in a given state. The "Q" stands for "quality" - how good is this action? The agent learns by trial and error, updating Q-values based on rewards received.
python
# The Bellman Equation - core of Q-Learning
# Q(s,a) = r + γ * max(Q(s',a'))
#
# Where:
# - Q(s,a) = value of taking action 'a' in state 's'
# - r = immediate reward
# - γ (gamma) = discount factor (0-1), values future rewards
# - s' = next state
# - max(Q(s',a')) = best possible value from next state
# Example: Snake eats food
# reward = +10
# gamma = 0.9
# max Q of next state = 5
# New Q = 10 + 0.9 * 5 = 14.5Gamma (γ) controls how much the agent cares about future rewards. γ=0 means only immediate rewards matter, γ=1 means future rewards are equally important as immediate ones.