Skip to main content
← Back to Lab

Snake Q-Learning

Watch an AI learn to play Snake using reinforcement learning.

AIQ-LearningNeural Network

Loading Snake Q-Learning...

What this explores

Deep Q-Learning with experience replay. Neural network learns optimal actions through trial and error.

Principles demonstrated

  • Q-Learning algorithm with neural network
  • Epsilon-greedy exploration strategy
  • Experience replay for stable learning
  • Real-time training visualization

Build It Yourself

Step-by-step guide to recreate this experiment

Advanced60-90 min

Prerequisites

  • Python basics
  • Understanding of neural networks
  • Basic calculus (gradients)
  • PyTorch fundamentals

Q-Learning is a reinforcement learning algorithm that learns the value (Q) of taking a particular action in a given state. The Q stands for quality. How good is this action? The agent learns by trial and error, updating its Q-values based on the rewards it actually gets.

python
# The Bellman Equation - core of Q-Learning
# Q(s,a) = r + γ * max(Q(s',a'))
#
# Where:
# - Q(s,a) = value of taking action 'a' in state 's'
# - r = immediate reward
# - γ (gamma) = discount factor (0-1), values future rewards
# - s' = next state
# - max(Q(s',a')) = best possible value from next state

# Example: Snake eats food
# reward = +10
# gamma = 0.9
# max Q of next state = 5
# New Q = 10 + 0.9 * 5 = 14.5

Gamma (γ) sets how much the agent cares about future rewards. γ=0 and only immediate rewards matter. γ=1 and future rewards count just as much as the one in front of it.