Overview
Real-time computer vision tool that watches your posture and phone use to help break the doom scroll habit. MediaPipe handles the face mesh (468 landmarks) and hand tracking (21 joints). YOLOv8 picks up the phone itself. Calibrates to you, smooths tracking with an exponential moving average, and escalates the warnings if you keep ignoring it.
The Problem
App timers only work inside the app you're already lost in. I wanted something that catches the physical act of head-down phone scrolling regardless of which app is open.
The Approach
Full 3D pose estimation is too jittery for this, so I track nose Y position relative to a calibrated neutral. Drop below the threshold and you're looking down. YOLO is the primary phone signal (high-confidence, triggers on its own). Hand grip detection is secondary and only fires alongside looking down, which kept the false positive rate sensible. Frame-buffer hysteresis stops the warnings flickering.
Outcome
Working detector at around 30 FPS on modern hardware. Warnings escalate from a gentle chime at 10 seconds to a flashing red screen at 40 seconds. Picked up some real lessons along the way about sensor fusion, signal smoothing, and how often the simpler approach is the right one.
