What this explores
Deep Q-Learning with experience replay. Neural network learns optimal actions through trial and error.
Principles demonstrated
- Q-Learning algorithm with neural network
- Epsilon-greedy exploration strategy
- Experience replay for stable learning
- Real-time training visualization
Build It Yourself
Step-by-step guide to recreate this experiment
Advanced60-90 min
Prerequisites
- Python basics
- Understanding of neural networks
- Basic calculus (gradients)
- PyTorch fundamentals
Q-Learning is a reinforcement learning algorithm that learns the value (Q) of taking a particular action in a given state. The Q stands for quality. How good is this action? The agent learns by trial and error, updating its Q-values based on the rewards it actually gets.
python
# The Bellman Equation - core of Q-Learning
# Q(s,a) = r + γ * max(Q(s',a'))
#
# Where:
# - Q(s,a) = value of taking action 'a' in state 's'
# - r = immediate reward
# - γ (gamma) = discount factor (0-1), values future rewards
# - s' = next state
# - max(Q(s',a')) = best possible value from next state
# Example: Snake eats food
# reward = +10
# gamma = 0.9
# max Q of next state = 5
# New Q = 10 + 0.9 * 5 = 14.5Gamma (γ) sets how much the agent cares about future rewards. γ=0 and only immediate rewards matter. γ=1 and future rewards count just as much as the one in front of it.