Heart Disease Classification

Overview

MSc AI mid-module assessment. Binary classifier for heart disease risk using the UCI Heart Disease dataset. I compared three families of model (Random Forest, SVM, Neural Network) with three configurations each, so nine in total, to see how the algorithm choice and hyperparameters actually shifted the result.

The Problem

Predict heart disease risk from patient health metrics. The dataset is small (303 samples), there are 13 clinical variables to weigh up, and picking the right algorithm and hyperparameters for a medical task isn't a one-size-fits-all decision.

The Approach

Full ML pipeline: exploratory analysis with correlation heatmaps, feature importance ranking, and distribution plots. Preprocessing used StandardScaler normalisation and stratified train-test splits. The nine configurations were Random Forest (default, depth-limited, sample-constrained), SVM (RBF, linear, tuned RBF), and Neural Network (simple, with dropout, wider). I evaluated with accuracy, precision, recall, F1, ROC-AUC, and confusion matrices.

Outcome

Random Forest with sample constraints came out on top at 85.25%. Chest pain type (cp), max heart rate (thalachh), and number of major vessels (caa) were the most predictive features. The neural networks overfit on a dataset this small even with dropout, which lined up with what I expected. Everything sits in one Jupyter notebook with visualisations and reproducible runs.

Overview

The Problem

The Approach

Outcome

More Projects

Agent Office

Jarvis