Deep Q-Learning Snake Agent

Overview

MSc Artificial Intelligence final assessment project implementing a Deep Q-Network (DQN) agent that learns to play Snake through trial and error. The agent uses an 11-dimensional state representation, experience replay for stable learning, and epsilon-greedy exploration. Includes comprehensive experiments comparing different neural network architectures (256, 512, and deeper configurations), memory buffer sizes (10K to 200K), and environment variants (with and without wall collisions).

The Problem

Develop an intelligent agent capable of learning optimal gameplay strategies for Snake without explicit programming of rules. The challenge involves handling sparse rewards (only positive reward when eating food), learning long-term strategies (avoiding traps), and balancing exploration vs exploitation during training.

The Approach

Implemented Deep Q-Learning with experience replay using PyTorch. The neural network takes an 11-feature state vector (danger detection in 3 directions, current direction, food location) and outputs Q-values for 3 actions (straight, left, right). Training uses the Bellman equation to update Q-values, with epsilon-greedy exploration that decays over time. Conducted 12 systematic experiments varying architecture depth, hidden layer width, and replay memory size.

Outcome

Best configuration achieved consistent scores of 40+ after 200 training episodes. Experiments revealed that wider networks (256 neurons) outperformed deeper ones for this task, and larger replay buffers improved stability. The wall-collision variant proved more challenging, requiring different hyperparameter tuning. Full analysis documented with training curves, architecture comparisons, and statistical summaries.

Overview

The Problem

The Approach

Outcome

More Projects

Heart Disease Classification

NCM Classification Practical