ML System Design
ml-system-design
System Design
ML System Design (Feature Store, Model Serving)
An ML system in production is mostly a data system with a model in the middle. The model is the smallest, most-discussed, and least-troublesome part. The hard parts are training data pipelines, feature freshness and parity between training and serving, the feature store that enforces that parity, model deployment and rollback, online and offline evaluation, and the operational concern that the model silently degrades as the world drifts. This lesson covers the canonical reference architecture: training pipeline, feature store with online and offline halves, model registry, serving infrastructure, monitoring, and the feedback loop. It is the senior-level mental model for designing 'add ML to product X' without falling into the standard traps.
Community
Building RAG: The Pipeline and Its Failure Modes
The full RAG pipeline (ingest, chunk, embed, retrieve, generate, evaluate), the seven failure modes I have actually hit, and the eval discipline that has kept my retrieval-augmented features honest in production.
ML Engineer Onsite: The Whiteboard Math Round
An ML onsite at a Series D recommendation-systems company, anchored on the math round where I had to derive a logistic regression gradient on a whiteboard.
Feature Store and Vector DB Tradeoff Quiz
A four-question reference set on the most common feature store and vector DB tradeoffs: online vs offline parity, point-in-time correctness, approximate nearest neighbor recall, and hybrid retrieval with metadata filters.
ML Engineer Pipeline Questions I Prep For
Five pipeline questions I bring with me to ML engineer loops. Training-serving skew, label leakage, batch vs streaming features, retraining cadence, and a small idempotent upsert into the feature store.
