Skip to main content
← Back to Projects
Twitter Sentiment Classification logo

Twitter Sentiment Classification

Entity-level sentiment classifier across six architectures, from a plain MLP up to fine-tuned transformers. Twelve runs, head-to-head: a tuned MLP topped the leaderboard at 98.6%, beating RoBERTa at a fraction of the training cost.

PythonPyTorchHugging FaceTransformersNLPscikit-learn

Overview

MSc NLP final assessment. Entity-level sentiment classification on the Twitter Entity Sentiment dataset (positive / negative / neutral, with irrelevant rolled into neutral). I ran six architectures back to back — MLP, BiLSTM, 1-D CNN, DistilBERT, RoBERTa, ALBERT — with two configurations each, so twelve experiments in total measuring accuracy, macro/weighted F1, precision, recall, and wall-clock training time.

The Problem

The interesting question with sentiment classification isn't 'can a transformer do this', it's 'when do you actually need one?'. Transformers are expensive to train and serve, and a lot of NLP tasks don't earn that cost back. I wanted concrete numbers on where the trade-off sits for entity-level sentiment.

The Approach

Same preprocessing and train/val split across every model. Classical and recurrent baselines (MLP, BiLSTM, 1-D CNN) used trained embeddings. The three transformer baselines (DistilBERT, RoBERTa, ALBERT) were fine-tuned from Hugging Face checkpoints. I ran two configs per model — different widths, layers, learning rates — and logged accuracy, macro F1, weighted F1, precision, recall, and training time for each run.

Outcome

MLP Config A topped the leaderboard at 98.6% accuracy and 0.986 macro F1, trained in 54 seconds. RoBERTa Config A came in at 97.5% but took over an hour to train. DistilBERT Config B managed 97.1% in 20 minutes. The headline finding for this dataset: a well-tuned MLP matched or beat every transformer at roughly 1/70th the training cost, which is a useful concrete data point about when transformer overhead is worth paying for.

More Projects