Imagine a user opens your page on a mid-range Android phone over 4G. They see a blank white screen for 4 seconds, then a flash of unstyled text for half a second, then the layout shifts twice as images load, then everything finally settles around the 6-second mark. You have a Lighthouse report telling you LCP is bad and CLS is bad and the JavaScript bundle is 380 KB. Where exactly do you intervene?
My answer, every time: figure out which stage of the rendering pipeline is the bottleneck before you change anything. "Make it faster" is not actionable; "the layout stage is taking 800 ms because we are forcing synchronous layout in a scroll handler" is. The five stages of the browser rendering pipeline are the mental model that turns vague performance complaints into specific code changes, and once I learned to think in those stages, my Chrome DevTools traces became readable instead of intimidating.
This article walks through what each stage does, what makes it slow, and what code shape triggers expensive recomputation in that stage. The stances are deliberate. By the end I want you to be able to look at any web performance problem and place it on the pipeline.
The pipeline at a glance
The pipeline runs once for the initial page load and then partially for every change after that. A change that flips a transform only re-runs stage 5. A change that toggles a display: none re-runs stages 3, 4, 5. A change that swaps a stylesheet re-runs all five. Knowing which change triggers which stages is the lever for performance work.
One note before I start: the actual implementation in modern Chromium and Firefox is more sophisticated than this five-stage view (parser-blocking vs render-blocking resources, off-thread parsing, parallel painting on tile workers, GPU-accelerated layout). But the conceptual five-stage model is what the DevTools timeline labels each frame with, and it is the model that maps cleanly to the optimizations you actually do as a web developer.
Stage 1: parse HTML and CSS
The browser receives HTML bytes from the network, decodes them, tokenizes, and builds the DOM tree. In parallel (with constraints) it parses CSS into the CSSOM. JavaScript execution can interleave with this depending on script tags and where they sit in the document.
The parser is fast; what makes this stage slow is rarely the parsing itself but what blocks it. The two big classes of blocker:
Render-blocking CSS. A <link rel="stylesheet"> in <head> blocks rendering until the CSS arrives and parses. The browser will not paint without the CSSOM, because painting an unstyled DOM and then re-painting after CSS arrives produces a visible flash of unstyled content (FOUC). So the stylesheet's network time and parse time both count against your first paint.
Render-blocking and parser-blocking JavaScript. A synchronous <script> tag in <head> (no async, no defer) blocks the parser at that point: the parser stops, the script downloads, the script runs, and only then does parsing resume. A <script> with defer waits until parsing is done; with async it runs as soon as it downloads, possibly mid-parse, but does not block parsing.
The optimization moves are well-known but worth repeating:
- Inline critical CSS in the HTML head, defer the rest with
media="print" onload="this.media='all'"or similar. - Use
<script defer>or<script type="module">(which defers by default) for everything not critical to first paint. - Use HTTP/2 or HTTP/3 server push, or
<link rel="preload">, to start critical fetches earlier in the parse.
A tell that this stage is your bottleneck: in the DevTools Network panel, a long blue bar before any rendering starts; a flame graph in Performance that shows "Parse HTML" running for hundreds of milliseconds with the rest of the timeline empty.
Stage 2: style computation
With the DOM and CSSOM both built, the browser runs style computation: for each DOM element, walk the matching CSS rules from most-specific to least-specific, resolve cascades and inheritance, and produce a ComputedStyle for that element. Every element gets one. The output of this stage is a per-element style record that downstream stages read from.
The cost is roughly proportional to (number of elements) times (cost of matching). Cost of matching depends on selector shape: deep descendant selectors (.sidebar .menu li a) are slower than direct selectors (.sidebar-menu-item), and universal selectors (*) and attribute selectors ([data-foo]) are slower than class selectors. Modern browsers are very fast at this even for complicated selectors, so I almost never optimize selectors directly.
What I do watch for:
- Massive stylesheets. A 4 MB CSS file (yes, I have seen one) parses slow and computes slower because the matching stage has to consider everything.
- Large DOM with frequent style invalidation. Toggling a class on
<body>invalidates style on every descendant. Doing it inside a scroll handler runs style computation 60 times a second on a tree of 5,000 elements. - CSS containment. The
contain: layout styleandcontain: contentproperties tell the browser "changes inside this subtree do not affect anything outside it", which lets style invalidation stop at the boundary. For widgets that update independently (a sidebar with its own state, a popup, a chart), this is real money.
The DevTools tell: a long "Recalculate Style" entry on the Performance flame graph, often triggered by a JavaScript event you can identify in the same frame.
Stage 3: layout (reflow)
Layout is the geometry phase. With computed styles in hand, the browser walks the box tree and figures out where each box goes, how big it is, where its children sit relative to it. The output is a layout tree where every visible box has a (x, y, width, height) plus more (transforms apply later).
Layout is expensive because it is fundamentally global. A change to one element can ripple into siblings (an element growing pushes neighbors down), into parents (a child changing its preferred width can shrink the parent's flex container), and into descendants (the parent's width changing forces children to re-flow). Modern browsers do incremental layout (recomputing only the subtrees that changed), but in the worst case a single style change forces a full document layout.
The code shapes that thrash layout:
The offsetHeight read is a layout-forcing call: the browser must complete any pending layout to give you an accurate value. The next line writes a style, which invalidates layout. The next iteration reads offsetHeight again, forcing layout again. 100 iterations means 100 layout passes. Same code with reads batched first:
One layout pass at the end of the frame instead of 100 in the middle. The pattern generalizes: read all the geometry you need, then write all the styles. Mixing reads and writes is what produces layout thrashing, and DevTools flags it as "Forced reflow" warnings.
Properties that trigger layout when changed: width, height, top, left, padding, margin, border-width, display, position, anything that affects the box geometry. Properties that do not: transform, opacity, filter, color (paint only). The optimization rule is: animate and update via transform and opacity whenever possible; touch the layout-affecting properties only when geometry actually needs to change.
Stage 4: paint
With layout done, the browser paints. Painting fills in the pixels for each visible box: backgrounds, borders, text, images, shadows. The output is a set of bitmaps, organized into compositing layers. Different boxes can paint to different layers depending on properties like transform, position: fixed, will-change, video, and canvas.
Paint is per-layer. A change to a layer that does not change its geometry repaints just that layer. A change that adds a new layer, or changes the layer tree, costs more.
What makes paint slow:
- Large painted area. Hero images that fill the viewport, full-page backgrounds with gradients, shadows that span the full page.
- Expensive painting effects.
box-shadowwith large blur radius,filter: blur(), complex SVGs, large background gradients with many color stops. These are GPU-accelerated on modern browsers but still cost more than a flat color. - Frequent invalidation of large layers. A scroll-driven animation that repaints the whole hero on every frame burns a lot of GPU.
The optimization patterns:
- Promote frequently-animating elements to their own compositing layer with
transform: translateZ(0)orwill-change: transform. The browser allocates them a dedicated layer; transform changes only re-composite, they do not repaint. - Avoid
box-shadowandfilter: bluron elements that animate or scroll. Replace with pre-rendered shadow images or simpler effects when possible. - Use
content-visibility: autoon offscreen sections so the browser can skip painting them entirely until they scroll into view.
The DevTools tell: long "Paint" entries in the Performance flame graph, especially if they correlate with scroll or animation events.
Stage 5: composite
The final stage assembles the painted layers into the frame the user sees. Compositing is GPU-accelerated; it is fast and runs off the main thread. Properties that only affect compositing are the cheapest things you can animate, because they do not block the main thread at all.
The two properties that matter here: transform and opacity. Animating those is essentially free for the main thread. The compositor runs on a different thread, reads the existing painted layer, applies a transform matrix or alpha mask, hands the frame to the GPU. No layout, no paint, no main-thread JavaScript impact.
This is why the perf-conscious advice for animations is always "animate transform and opacity, not top/left or width/height". A modal that fades and slides in via transform: translateY(-20px) to transform: translateY(0) plus an opacity transition runs at 60 fps on a phone. The same animation via top and a non-composited fade triggers layout and paint every frame and stutters.
The limits: only certain properties composite, and there is a cost to creating too many layers (each layer takes GPU memory). will-change should be used sparingly and removed when the animation finishes; declaring everything as will-change: transform defeats the optimization by exhausting memory.
The pipeline as a debugging tool
The value of knowing the five stages is that performance complaints become localized. "The page is slow" is not a question I can answer; "the page spends 600 ms in style computation after every search keystroke" is. The DevTools Performance tab labels every frame with which stages ran and how long each took. Reading those labels tells you where to look:
- Long parse / network time: bundle smaller, defer non-critical, preload critical.
- Long style: trim selectors, contain subtrees, reduce DOM size.
- Long layout: batch reads and writes, avoid
offsetHeightin tight loops, animatetransformnotwidth. - Long paint: smaller painted area, avoid blur and shadow on animations, use
content-visibility. - Long composite: rare; usually means too many layers or layers too large.
The wrong move is to chase generic advice ("reduce JavaScript bundle size") without knowing which stage is the bottleneck. A 200 KB bundle is not your problem if your slow frame is 600 ms in layout.
Where the rendering pipeline is heading next
The pipeline itself is decades old. The five stages have been in browsers since IE6, and they are how rendering still works in Blink, WebKit, and Gecko in 2026. What changed in the last few years is the toolchain around it. View Transitions API moves cross-page transitions onto the compositor. The Speculation Rules API prefetches and pre-renders so the parse and style stages can run before the user clicks. CSS Containment, content-visibility, and the upcoming paint-order give us more direct control over which subtrees re-run which stages.
Those APIs are evolving the surface, but they are not changing the model. Parse, style, layout, paint, composite. Five stages, five flavors of slow. Once you can name them, the next time DevTools shows you a long red bar, you can read it and act on it instead of guessing.
