Hawk
A self-correcting ML system trading live financial markets.
An end-to-end machine-learning system that ingests live market data 24/7, predicts short-horizon outcomes with a calibrated model, and retrains itself behind automated drift gates — running unattended in production for months.
The problem
Most ML projects die in a notebook: a model trained once on a static dataset, evaluated on a clean test split, and never deployed. The hard part of machine learning in the real world isn't fitting a model — it's keeping one accurate, calibrated, and trustworthy as live data drifts underneath it.
Hawk is my answer to that. It targets a deliberately unforgiving domain — short-horizon prediction on live, adversarial markets — as a forcing function for genuine production ML engineering. The goal was never a one-off backtest; it was a system that operates for months and corrects itself.
What I built
- Live data pipeline: a collector running 24/7 on a Linux VPS snapshots market and order-book state on a fixed interval into a columnar (Parquet) store — 130k+ snapshots and counting.
- Engineered feature set built on domain physics (distance-to-strike normalized by time and volatility, order-book imbalance, momentum) rather than throwing raw inputs at the model.
- LightGBM classifier with isotonic probability calibration and group-aware walk-forward validation, so reported confidence reflects real-world frequencies instead of overfit optimism.
- Automated retraining behind safety gates: the system retrains on new data only when it improves, rejecting any candidate that regresses calibration (Brier score) beyond a strict threshold.
- Hot model-reload: the live process swaps in a newly accepted model on file change — no downtime, no restart.
- Observability built in: drift monitoring, a model-change audit log, and alerting so a silent model swap can never go unnoticed.
- Risk discipline as a first-class feature — fixed-dollar sizing, position limits, and staged validation (paper-first) before any real capital. The system is engineered to scale capital deliberately, not recklessly.
Architecture
- 1Coinbase / market WebSocket + REST → live price & order-book feed
- 2Collector (systemd, 24/7) → snapshot every interval → Parquet store
- 3Feature engineering → LightGBM + isotonic calibration → walk-forward eval
- 4Retrain job → Brier-regression gate → accept / reject candidate model
- 5Live engine hot-reloads accepted model → calibrated prediction + risk sizing
- 6Drift monitor + model-change audit log + alerting
What it demonstrates
Hawk is the full MLOps lifecycle in one system: data engineering, feature engineering, calibrated modeling, time-series validation, automated retraining with regression guards, hot deployment, and production monitoring.
It reflects how I think about ML in production — that a model is only as good as the system keeping it honest, and that calibration, drift detection, and risk controls matter more than a single impressive metric.
Stack
In production
