Skip to main content
← Back to Lab

Snake Q-Learning

Watch an AI learn to play Snake using reinforcement learning.

AIQ-LearningNeural Network

Loading Snake Q-Learning...

What this explores

Deep Q-Learning with experience replay. Neural network learns optimal actions through trial and error.

Principles demonstrated

  • Q-Learning algorithm with neural network
  • Epsilon-greedy exploration strategy
  • Experience replay for stable learning
  • Real-time training visualization

Build It Yourself

Step-by-step guide to recreate this experiment

Advanced60-90 min

Prerequisites

  • Python basics
  • Understanding of neural networks
  • Basic calculus (gradients)
  • PyTorch fundamentals

Q-Learning is a reinforcement learning algorithm that learns the value (Q) of taking an action in a given state. The "Q" stands for "quality" - how good is this action? The agent learns by trial and error, updating Q-values based on rewards received.

python
# The Bellman Equation - core of Q-Learning
# Q(s,a) = r + γ * max(Q(s',a'))
#
# Where:
# - Q(s,a) = value of taking action 'a' in state 's'
# - r = immediate reward
# - γ (gamma) = discount factor (0-1), values future rewards
# - s' = next state
# - max(Q(s',a')) = best possible value from next state

# Example: Snake eats food
# reward = +10
# gamma = 0.9
# max Q of next state = 5
# New Q = 10 + 0.9 * 5 = 14.5

Gamma (γ) controls how much the agent cares about future rewards. γ=0 means only immediate rewards matter, γ=1 means future rewards are equally important as immediate ones.