Embedding
embedding
System Design
Recommendation Systems Architecture
A recommendation system at scale is a multi-stage funnel: candidate generation narrows millions of items to a few thousand, light ranking trims to a few hundred, heavy ranking scores those, and a re-ranking stage applies business and policy constraints. Each stage has a different latency budget, a different model, and a different operational profile. This lesson covers the canonical architecture (retrieval + ranking + re-ranking), the core algorithmic families (collaborative filtering, content-based, two-tower neural retrieval, sequential models), the embedding store and vector ANN serving stack, the cold-start problem, ranking objectives and the metrics that measure them, and the rollout / monitoring discipline that keeps the system honest. The goal is to leave you able to design the recommendation system for any consumer product and defend every layer's choices.
Community
Building RAG: The Pipeline and Its Failure Modes
The full RAG pipeline (ingest, chunk, embed, retrieve, generate, evaluate), the seven failure modes I have actually hit, and the eval discipline that has kept my retrieval-augmented features honest in production.
An Embedding Cache With Content-Hash Keys
Re-embedding the same paragraphs on every deploy was costing us $400 a month. This is the SQLite-backed cache I shipped: the key is sha256(model + normalized text), TTL is per-row, and a single batch call backfills misses.
Embeddings and Vector Search, Explained for Devs
What an embedding actually is, why cosine similarity is the metric you reach for, and the production decisions (chunking, hybrid search, dimension count) that determine whether a vector search ships or sits.
