Deep Q-Learning Snake Agent

Overview

My MSc AI final project. A Deep Q-Network agent that learns to play Snake from scratch. The agent sees the world through an 11-feature state vector, stores past experiences in a replay buffer, and uses epsilon-greedy exploration to decide when to try something new versus exploit what it already knows. I ran 12 experiments varying network width (256 vs 512), depth, memory buffer sizes (10K to 200K), and a wall-collision variant of the environment.

The Problem

Build an agent that figures out how to play Snake without me hard-coding any rules. Rewards are sparse (food only), the agent has to think more than one step ahead to avoid trapping itself, and there's the usual exploration vs exploitation trade-off to manage during training.

The Approach

Deep Q-Learning with experience replay in PyTorch. The network takes the 11-feature state (danger in 3 directions, current heading, food position) and outputs Q-values for 3 actions (straight, turn left, turn right). Q-values get updated via the Bellman equation, and exploration decays over time. I ran the 12 experiments to actually see what mattered, rather than guessing.

Outcome

Best configuration hit consistent scores of 40+ after 200 training episodes. A few clear findings: wider beat deeper for this task, and bigger replay buffers stabilised learning. The wall-collision variant was noticeably harder and needed different hyperparameters. Everything is documented with training curves, architecture comparisons, and statistical summaries.

Overview

The Problem

The Approach

Outcome

More Projects

Agent Office

Jarvis