All work
Live/2026

Hawk

A self-correcting ML system trading live financial markets.

An end-to-end machine-learning system that ingests live market data 24/7, predicts short-horizon outcomes with a calibrated model, and retrains itself behind automated drift gates — running unattended in production for months.

Months, unattended
Uptime
130k+
Live snapshots
Drift-gated
Retrain safety
Isotonic
Calibration

The problem

Most ML projects die in a notebook: a model trained once on a static dataset, evaluated on a clean test split, and never deployed. The hard part of machine learning in the real world isn't fitting a model — it's keeping one accurate, calibrated, and trustworthy as live data drifts underneath it.

Hawk is my answer to that. It targets a deliberately unforgiving domain — short-horizon prediction on live, adversarial markets — as a forcing function for genuine production ML engineering. The goal was never a one-off backtest; it was a system that operates for months and corrects itself.

What I built

  • Live data pipeline: a collector running 24/7 on a Linux VPS snapshots market and order-book state on a fixed interval into a columnar (Parquet) store — 130k+ snapshots and counting.
  • Engineered feature set built on domain physics (distance-to-strike normalized by time and volatility, order-book imbalance, momentum) rather than throwing raw inputs at the model.
  • LightGBM classifier with isotonic probability calibration and group-aware walk-forward validation, so reported confidence reflects real-world frequencies instead of overfit optimism.
  • Automated retraining behind safety gates: the system retrains on new data only when it improves, rejecting any candidate that regresses calibration (Brier score) beyond a strict threshold.
  • Hot model-reload: the live process swaps in a newly accepted model on file change — no downtime, no restart.
  • Observability built in: drift monitoring, a model-change audit log, and alerting so a silent model swap can never go unnoticed.
  • Risk discipline as a first-class feature — fixed-dollar sizing, position limits, and staged validation (paper-first) before any real capital. The system is engineered to scale capital deliberately, not recklessly.

Architecture

  1. 1Coinbase / market WebSocket + REST → live price & order-book feed
  2. 2Collector (systemd, 24/7) → snapshot every interval → Parquet store
  3. 3Feature engineering → LightGBM + isotonic calibration → walk-forward eval
  4. 4Retrain job → Brier-regression gate → accept / reject candidate model
  5. 5Live engine hot-reloads accepted model → calibrated prediction + risk sizing
  6. 6Drift monitor + model-change audit log + alerting

What it demonstrates

Hawk is the full MLOps lifecycle in one system: data engineering, feature engineering, calibrated modeling, time-series validation, automated retraining with regression guards, hot deployment, and production monitoring.

It reflects how I think about ML in production — that a model is only as good as the system keeping it honest, and that calibration, drift detection, and risk controls matter more than a single impressive metric.

Stack

PythonLightGBMpandas / NumPyParquetscikit-learnLinux / systemdVPS (24/7)

In production

Hawk's live operations dashboard: session stats, active positions, recent closed trades with model probabilities, and a per-strategy performance breakdown.
Hawk's live operations dashboard — a real-time terminal UI surfacing session P&L, open positions, recent market resolutions (with the model's predicted probability per trade), and a per-strategy performance breakdown. This is the observability layer that makes the system safe to run unattended. (Single live session, May 2026.)