System Design Article

Microservices vs Monolith: When to Choose What

Difficulty: Medium

Microservices are not a maturity badge. Monoliths are not a code smell. The honest interview answer is that architecture is a continuum (monolith, modular monolith, services, microservices) and the right point on it is set by team size, deployment frequency, and the cost of distribution, not by what the cool kids at Netflix did. This lesson walks through the trade-offs concretely: latency tax, operational overhead, organizational coupling (Conway's Law), data consistency, and the migration paths that work. By the end you can defend either choice for a given product without reaching for buzzwords.

Microservices vs Monolith: When to Choose What

System Design

Medium

microservices

monolith

microservices-architecture

system-design

advanced

premium

distributed-systems

358 views

Motivation

The microservices vs monolith debate is the single most overcooked topic in system design. The truth is unglamorous: the right architecture is the one that lets your team ship safely at the rate the business demands. For a 5-engineer startup that means a monolith. For Amazon (1000s of services, deploy every 11.6 seconds) it means microservices. For most companies, it means somewhere in between.

Why do interviewers care so much? Because picking the wrong point on the continuum is enormously expensive in two directions:

Microservices too early: a 5-engineer team adopting Kubernetes, service mesh, distributed tracing, and 8 services to ship a CRUD app spends 80% of their time on infra instead of product. The startup graveyard is full of these.
Monolith too long: a 200-engineer team on one Rails monorepo where every deploy is a 4-hour merge train, every change risks breaking unrelated features, and onboarding a new team takes 6 months. The brownfield consulting market is full of these.

A senior engineer can name both failure modes, place the current system on the continuum, and propose a credible move along it. That is the bar this lesson teaches.

Deep Dive

The architecture continuum

Microservices vs monolith is a false binary. The real spectrum has at least four points:

Text

[ Monolith ] -> [ Modular Monolith ] -> [ Services ] -> [ Microservices ]
   single        single deploy,           ~5-30           50+ services,
   deploy        strict module            independent     independent teams,
   one DB        boundaries,              services,       independent data,
                 maybe one DB             shared infra    polyglot OK

Monolith. One codebase, one deploy artifact, one database. Functions call functions. No network in the critical path inside the app.

Modular monolith. Same single deploy and (often) single DB, but the codebase is split into hard module boundaries that talk only through explicit APIs. You can cross-module call, but you cannot reach into another module's tables. Shopify, GitHub, and Stack Overflow famously run this way.

Services (a few). A handful (5-30) of independently deployable services, often built around a shared library, a shared schema, or a single team owning multiple services. Less ceremony than full microservices.

Microservices. Many small services (often 1 service per team or per bounded context), independently deployed, independently scaled, with their own data store. Inter-service communication is over the network. Polyglot is allowed.

The tax you pay for distribution

The cheap mental model: every function call you turn into a network call gets 1000x slower and 100x more failure-prone.

Operation	In-process call	Cross-service call
Latency	~10 ns	~1 ms (same DC) to ~50 ms (cross-region)
Failure modes	None (or panic)	Timeout, retry, partial failure, cascading failure
Debugging	Single stack trace	Distributed trace across N services
Data consistency	ACID transaction	Saga, eventual consistency, or 2PC (slow)
Deploy coordination	One artifact	N artifacts, version skew

This 'distribution tax' is the single most under-estimated cost. A function call inside a Python process that took 50 nanoseconds becomes a 1 ms HTTP call, plus a possible timeout, plus a retry, plus a circuit breaker, plus a metric, plus a span in your tracing system. Multiply by the number of edges in your service graph and you get an idea of the operational surface area.

Conway's Law: org chart drives architecture

Melvin Conway in 1968: 'Organizations design systems that mirror their communication structure.'

The modern reading: if you have 4 teams, you will end up with 4 services (or 4 modules in a monolith), no matter what diagram you draw on day one. Trying to fight this is futile; the right move is to design teams around the boundaries you want.

The pragmatic corollary, called Inverse Conway Maneuver: change the org chart to get the architecture you want.

Text

+-------------------+         +-------------------+
| 4 teams,          |  ===>   | 4 services with   |
| 1 monolith        |  6 mo   | 1 owner each      |
+-------------------+         +-------------------+

If you have 5 engineers total, you do not have 5 teams. You have 1 team. Therefore you should have 1 deploy unit. Therefore you should have a monolith.

Data ownership: the hardest part

Microservices means each service owns its data. No shared database. No reaching across to another service's tables. If service A needs data from service B, it calls B's API.

This sounds clean and is brutal in practice:

Joins disappear. A query that was SELECT users.*, orders.* FROM users JOIN orders becomes 'call user-service, then call order-service for each user, then merge in app code' (the classic N+1 problem).
Transactions disappear. You cannot wrap two services in a BEGIN; ... COMMIT;. You need sagas, outbox patterns, or compensating transactions.
Reporting becomes a separate problem. You build a data warehouse / ETL pipeline because no single DB has all the data anymore.
Foreign keys are gone. The DB no longer enforces 'this order_id must exist'. You do it in app code. Race conditions appear.

If your team is not ready to own this, microservices will hurt more than help.

What you actually buy with microservices

For all the cost, microservices give real things back:

Independent deploys. Team A ships every hour without coordinating with team B. Deployment frequency scales with team count, not codebase size.
Independent scaling. The video-encoding service runs on 200 GPU instances. The user-profile service runs on 4 small CPU instances. In a monolith you would scale the whole thing.
Fault isolation. A bug in the recommendations service should not take down checkout. (In practice, only with serious circuit-breaker discipline.)
Polyglot. Use Python for ML, Go for high-throughput pipes, Rust for the hot loop. In a monolith you mostly pick one language.
Team autonomy. A team owns their service end-to-end (build, deploy, on-call). This is the real org win and probably the single biggest reason microservices took off.

Notice how three of the five (independent deploys, independent scaling, team autonomy) are about organizations, not technology. That is the real signal.

What people THINK they buy and don't

Common false promises:

'Microservices are easier to test.' They are not. Unit tests are easier. Integration tests are dramatically harder because you have to spin up N services or use brittle contract tests.
'Microservices are easier to understand.' Each one is, but the system as a whole is much harder. The graph of who calls whom is the new complexity.
'Microservices avoid the big rewrite.' Sometimes. Often the rewrite just becomes a 4-year incremental migration with both systems running.
'Microservices give you cloud portability.' No, that is just running in containers. You can do that with a monolith.

Implementation

Patterns and tooling required for microservices to work

If you go microservices, here is the minimum kit you need (and the cost of NOT having each piece):

Capability	Tooling examples	What breaks without it
Service discovery	Consul, etcd, Kubernetes DNS	Hardcoded URLs, deploys break callers
API gateway	Kong, Envoy, AWS API Gateway	No single edge, every client knows internal topology
Distributed tracing	OpenTelemetry, Jaeger, Datadog APM	One slow request, no idea which service caused it
Centralized logging	ELK, Loki, Datadog	Cannot grep across services, debugging takes hours
Metrics + alerting	Prometheus + Grafana, Datadog	No SLOs, on-call is reactive
CI/CD per service	GitHub Actions, ArgoCD	Deploys are manual and risky
Container orchestration	Kubernetes, ECS, Nomad	Manual scaling and placement
Service mesh (optional)	Istio, Linkerd	mTLS, retries, circuit breakers all hand-rolled
Async messaging	Kafka, RabbitMQ, SQS	Every interaction is sync, cascading failures
Schema registry / contract tests	Confluent Schema Registry, Pact	Breaking changes silently break consumers
Distributed transactions handling	Saga orchestrator, outbox pattern	Data inconsistency on failure

This is the microservices premium that Martin Fowler talks about: you pay all these costs upfront before you see ANY benefit. If your team is small, you are paying for capacity you do not need.

Migration: the strangler fig

When you do need to move from a monolith to services, the technique that works is Martin Fowler's strangler fig (named after the tropical plant that grows around an existing tree until the original is gone).

Text

Phase 1: only monolith
+----------+      +----------+
|  Client  | ---> | Monolith |
+----------+      +----------+

             Phase 2: introduce a proxy
+----------+      +-------+      +----------+
|  Client  | ---> | Proxy | ---> | Monolith |
+----------+      +-------+      +----------+

             Phase 3: extract one capability
+----------+      +-------+      +----------+
|  Client  | ---> | Proxy | ---> | Monolith |
+----------+      +---+---+      +----------+
                      |
                      v
                 +-----------+
                 | Service A |
                 +-----------+

             Phase N: monolith is empty, can be deleted
+----------+      +-------+      +-----------+
|  Client  | ---> | Proxy | -+-> | Service A |
+----------+      +-------+  |   +-----------+
                             +-> | Service B |
                             |   +-----------+
                             +-> | Service C |

Key rules of a strangler migration:

Pick a leaf capability first (notifications, search, billing) where the data dependencies on the rest of the monolith are minimal.
Extract via the API, not the database. If service A reads the monolith's tables directly, you are coupled forever. The new service must own its own storage.
Use Change Data Capture to seed the new service. Stream from the monolith DB into the new service's DB until you cut over writes.
Run dual-write during the transition. Write to both the monolith and the new service, compare reads, fix gaps. This is operationally annoying but catches data drift early.
Cut over reads first, then writes. Reads can flip with a feature flag; writes need careful migration of foreign-key data.
Delete the old code. The death of a strangler migration is leaving the monolith path 'just in case'. It will live forever.

Service boundaries: Domain-Driven Design

The most common technique for picking service boundaries is bounded contexts from Domain-Driven Design (Eric Evans). Group code that changes together for the same business reason, separate code that changes for different reasons.

A simple heuristic: a good service boundary is one where two adjacent teams almost never need to coordinate on a release. If every change in service A requires a coordinated change in service B, the boundary is wrong: either merge them, or move the misplaced code.

Another heuristic from Sam Newman: a service should be replaceable in 2 weeks by a team that did not write it. If it cannot be, it is too big.

The modular monolith: the underrated middle ground

The most underused architecture in 2026 is the modular monolith. You get most of the benefits of monolith (single deploy, easy debugging, ACID transactions, no network tax) AND you preserve the ability to extract services later.

The rules:

One process, one deploy. Same as a monolith.
Hard module boundaries. Use language features (Java packages with interface-only exports, Go internal packages, TypeScript module fences, Rust crates) to make cross-module access at compile time impossible except through declared APIs.
Shared database is OK, but with a per-module schema. Each module owns a set of tables. Other modules cannot touch them.
Module-to-module calls go through the same API surface a future external service would. This is what makes later extraction cheap.

Shopify scaled to billions of dollars of GMV on a Rails modular monolith called the 'Componentized Monolith'. They have written publicly about the discipline required. It works.

When to Use

Decision matrix

Signal	Stay monolith / modular	Move to microservices
Team size	< 30 engineers	> 50 engineers across 5+ teams
Deploy pain	Daily deploys are fine	Merge trains take hours, deploys block each other
Codebase build time	< 5 min	> 30 min and growing
Independent scaling needs	All components scale together	One component is the bottleneck (encoding, ML inference)
Fault isolation	Outage of one feature acceptable	Some flows must not be allowed to fail others
Org maturity	No platform team, no SRE	Dedicated platform / SRE team exists
Data consistency	Cross-feature transactions important	Each domain has clear bounded data
Polyglot needs	Mostly one language is fine	ML in Python, infra in Go, frontend in TS, all needed

If you have two or fewer signals in the right column, do not microservice. Modular monolith.

If you have five or more, you probably should.

If you are in between, services (a few, not many) is the sweet spot.

Anti-patterns

Distributed monolith: the worst of both worlds. Many services, but they all deploy together because of tight coupling. You pay every microservices cost and get none of the benefits. This is shockingly common.
Nano-services: services so small (one endpoint each) that the network call dwarfs the work. The function should have stayed a function.
Shared database: many 'microservices' all writing to the same DB. There is no isolation, no independent deploy, just operational overhead. This is a monolith pretending to be microservices.
Synchronous chains: A calls B calls C calls D. Latency adds up, failures cascade, and one slow service times out everyone upstream. Async messaging or fewer hops is the cure.

Alternative: serverless

Serverless (FaaS) is its own answer to 'how do I avoid running infrastructure?'. It overlaps with microservices but solves a different problem (no servers, pay per request) at the cost of cold starts and vendor lock-in. Covered in a separate lesson, but worth flagging in interviews when the requirement is bursty workload, not engineering scale.

Case Studies

Amazon: the original microservices story

Amazon famously moved from a monolithic Obidos codebase to thousands of services starting around 2002, driven by Jeff Bezos's mandate that all teams expose their data as services and all services be designed as if external. The numbers people cite are staggering: roughly 50 million deploys per year across the company, a deploy roughly every 11.6 seconds, services owned by '2-pizza teams' (small enough to be fed with two pizzas).

What is rarely told: this took 5 to 7 years of intense platform investment. AWS itself was partly born from this internal platform work. Without that investment, the migration would have been a disaster.

Lesson: microservices at scale work. The platform investment to get there is enormous.

Netflix: cloud-native microservices

Netflix moved from a single Oracle-backed monolith to ~700 microservices on AWS between 2008 and 2012 after a database corruption caused 3 days of outage. They open-sourced large parts of their platform (Hystrix, Eureka, Zuul) in the process.

Netflix is the poster child but their context is unusual: huge engineering org, predictable workload (video streaming), willingness to invest in chaos engineering and platform tooling. Most companies are not Netflix.

Lesson: microservices fit Netflix's combination of scale, predictable load, and engineering investment. Do not assume the same fit applies to your CRUD SaaS.

Shopify: modular monolith

Shopify processes billions of dollars of GMV on a Rails monolith they call the 'Componentized Monolith'. They invested heavily in tooling (custom linters, runtime boundary enforcement, package isolation) to keep modules from leaking into each other. They have a small number of supporting services around the core, but the core is intentionally not microservices.

Lesson: a modular monolith can scale to enormous business volume if you treat the boundaries as seriously as service boundaries.

Segment: rolled back from microservices

In 2018 Segment publicly described moving 140+ destination microservices BACK into a monolith. The microservices version had become unmanageable: 140 services meant 140 sets of dashboards, 140 deploy pipelines, and a shared infrastructure cost that grew faster than the business. Consolidating cut their per-destination cost dramatically and let a small team operate the system.

Lesson: the right architecture changes as you scale up AND down. Senior engineers know when to merge, not just split.

Uber: split, then DOMA

Uber rapidly grew to 2200+ microservices around 2017, then introduced the 'DOMA' (Domain-Oriented Microservice Architecture) framework around 2020 to consolidate adjacent services back into bigger 'domain' groupings. They kept microservices for genuine team-boundary reasons but stopped splitting for the sake of splitting.

Lesson: 'just keep splitting' is not a strategy. Periodic consolidation is part of microservice maturity.

Stack Overflow: famously monolithic

Stack Overflow ran (and largely still runs) one of the most-trafficked sites on the internet on a small set of large servers, a single .NET monolith, and a SQL Server cluster. Their per-request cost is much lower than equivalent microservice deployments. They have publicly defended the choice as deliberate.

Lesson: high traffic does not require microservices. Operational simplicity is a real and underrated business asset.

Quick Review

Architecture is a continuum, not a binary. Modular monolith is the underrated middle.
The distribution tax (network latency, partial failures, distributed transactions, observability) is real and large. Pay it only when you need to.
Conway's Law is the strongest force in the room. Design teams around the architecture you want.
Microservices buy you team autonomy and independent deploys, mostly. They do not buy you simplicity.
The strangler fig is the migration pattern that works. Big-bang rewrites do not.
Senior teams move in BOTH directions on the continuum. Splitting and merging are both legitimate.
If your team is under 30 engineers, you almost certainly do not need microservices. Build a modular monolith and earn the right to split.

Real-World Examples

How real systems implement this in production

Amazon

Moved from the monolithic Obidos codebase to thousands of services starting around 2002, driven by Jeff Bezos's API mandate. Reportedly performs roughly one production deploy every 11.6 seconds across the company, organized around 'two-pizza teams'. The migration required 5+ years of platform investment and effectively gave birth to AWS.

Trade-off: 5+ years of platform investment (deployment infrastructure, internal tooling) preceded the architecture paying off; smaller orgs lack the runway to copy this directly.

Netflix

Migrated from an Oracle-backed monolith to ~700 microservices on AWS between 2008 and 2012 after a 3-day outage from database corruption. Open-sourced their platform stack (Hystrix circuit breakers, Eureka service discovery, Zuul gateway). Their context (huge engineering org, predictable streaming workload, heavy chaos-engineering investment) is rarely matched by other companies that copy the architecture.

Trade-off: The microservices win required massive investment in resilience tooling that became the de-facto industry standard; companies adopting microservices without similar investment frequently regret it.

Shopify

Runs the core commerce platform on a single Rails monolith they call the 'Componentized Monolith', processing billions in GMV. Invested heavily in linters and runtime checks to enforce module boundaries inside the monolith. Demonstrates that a disciplined modular monolith scales to massive business volume without splitting into microservices.

Trade-off: Keeping the monolith works only because they invest continuously in custom linters and module boundaries; without that discipline, a monolith of this size becomes a tangled mess.

Segment

Publicly described in 2018 how they consolidated ~140 destination microservices back into a single monolith. Operating 140 services meant 140 dashboards, 140 deploy pipelines, and rising infra cost per destination. Consolidation cut costs and let a small team manage the system. A widely cited counter-example to 'microservices always'.

Trade-off: Splitting into 140 services was premature; consolidation cut operational toil but required engineering effort to undo decisions that had seemed locked in.

Uber

Grew to 2200+ microservices by around 2017, then introduced 'DOMA' (Domain-Oriented Microservice Architecture) around 2020 to consolidate adjacent services back into larger domain groupings. Demonstrates that splitting is not a one-way ratchet and that periodic consolidation is part of microservices maturity.

Trade-off: Both directions of the migration were expensive; the lesson is that 'microservices' is not the goal, the right service-to-team mapping is.

Quick Interview Phrases

Key terms to use in your answer

distribution tax

Conway's Law

modular monolith

strangler fig migration

microservices premium

distributed monolith

Common Interview Questions

Questions you might be asked about this topic

When would you choose a monolith over microservices for a new product?

Default to monolith (or modular monolith) when team < 30, deployment cadence is daily or slower, no platform team exists, and product requirements are still evolving. Microservices premium (service mesh, tracing, CI/CD per service, on-call rotations) is too expensive to pay before the team is large enough to benefit from it. Mention that you would design with hard module boundaries so you can extract services later without rewriting.

How would you migrate an existing monolith to microservices?

What is Conway's Law and why does it matter for service boundaries?

What is a 'distributed monolith' and how do you avoid it?

How do you handle a transaction that spans multiple services?

Interview Tips

How to discuss this topic effectively

Always frame the choice as a continuum, not a binary. Mentioning the modular monolith as the middle ground signals seniority immediately.

Quote concrete distribution-tax numbers (in-process call ~10 ns, cross-service call ~1 ms, plus retries and timeouts). Hand-waving 'microservices add overhead' sounds junior.

Reach for Conway's Law. Saying 'architecture follows team boundaries' shows you have built systems with humans, not just code.

Name the migration path (strangler fig with a proxy, extract by capability, dual-write then cut over). Interviewers want to know you would not propose a big-bang rewrite.

Mention companies that went BOTH ways (Segment back to monolith, Uber's DOMA consolidation). It demonstrates you understand splitting is not always the answer.

Common Mistakes

Pitfalls to avoid in interviews

Recommending microservices for a small team to 'future-proof' the architecture

A 5-engineer team running 8 microservices spends 80% of their time on infra. Build a modular monolith with clean boundaries, and extract services only when team and traffic growth demand it.

Treating shared databases between services as acceptable microservices

If two services share a database, they are not independent. You have a distributed monolith: all the operational cost of services and none of the deployment independence.

Assuming microservices automatically improve fault isolation

Without circuit breakers, timeouts, and async messaging, microservices cascade failures more easily than monoliths. A naive synchronous service chain (A -> B -> C -> D) is more fragile than the equivalent monolith.

Forgetting that distributed transactions are no longer free

An ACID transaction across two service boundaries does not exist. You need sagas, outbox patterns, or accept eventual consistency. This is one of the largest hidden costs of splitting services along data boundaries.

Pitching a big-bang rewrite from monolith to microservices

Big-bang rewrites take years and usually fail. The strangler-fig pattern (introduce a proxy, extract one capability at a time, retire the monolith over months or years) is the only migration approach that consistently works at scale.

Back to System Design