Machine Learning
machine-learning
Behavioral Interviews
Behavioral for ML / Data Engineers
ML and data engineering loops grade for a cluster of behavioral signals that other engineering loops weight less heavily: experimentation rigor, the craft of being wrong with data and catching it yourself, data ethics judgement under tradeoff, ambiguity tolerance on problems where the right answer is not knowable in advance, and substantive collaboration with research and platform teams. The behavioral signal is woven heavily into the technical rounds (the ML system design round, the applied ML deep dive) as well as a dedicated behavioral round. This lesson defines the cross-cutting ML and data signals interviewers grade, walks through how the loop probes for experimentation discipline rather than story-telling about results, maps the signals to the questions interviewers ask, and shows two model answers tailored to the experiment-was-wrong and data-ethics judgement story shapes.
Community
AI Coding Assistants: Where They Help and Where They Hurt
Two years of using AI coding assistants daily, the four tasks where they have made me measurably faster, the three places they have actively cost me time, and the workflow I have settled on.
Building RAG: The Pipeline and Its Failure Modes
The full RAG pipeline (ingest, chunk, embed, retrieve, generate, evaluate), the seven failure modes I have actually hit, and the eval discipline that has kept my retrieval-augmented features honest in production.
ML Engineer Onsite: The Whiteboard Math Round
An ML onsite at a Series D recommendation-systems company, anchored on the math round where I had to derive a logistic regression gradient on a whiteboard.
Prompt Engineering Patterns That Survived Six Months of Prod
The five prompting techniques that have actually held up across model upgrades, the four that I tried and dropped, and the eval discipline that lets me tell which is which.
ML Engineer Pipeline Questions I Prep For
Five pipeline questions I bring with me to ML engineer loops. Training-serving skew, label leakage, batch vs streaming features, retraining cadence, and a small idempotent upsert into the feature store.
LLM Fundamentals: Tokens, Context, and Cost
Tokens are not characters or words. Context is not free. Cost is per-token in both directions. The three fundamentals that determine 80% of how an LLM-backed feature performs and bills.
Embeddings and Vector Search, Explained for Devs
What an embedding actually is, why cosine similarity is the metric you reach for, and the production decisions (chunking, hybrid search, dimension count) that determine whether a vector search ships or sits.
