I once measured a marketing landing page that loaded a 1.4 MB hero image: a JPEG at 4000 by 2500 pixels, served from origin, no srcset, no width or height attribute, no lazy loading, no caching headers. The page rendered for two seconds on a phone, shifted layout twice as the image came in, and the LCP element was that hero. Optimizing it took an hour. The new version was 110 KB AVIF, sized correctly per viewport, with reserved space and proper caching. LCP went from 4.2 seconds to 1.3 seconds on the same network. None of the changes were exotic. All of them are things every web team should be doing on every image, and most teams do not.
My stance: image optimization in 2026 is mostly four mechanical decisions, applied per image, that move you from "unoptimized" to "good enough that the user does not notice". Format selection, responsive sizing, lazy loading, reserved space. After those four are right, you can chase the long tail (CDN edge transforms, blur-up placeholders, priority hints), but the bulk of the win lives in the basics. Skip the basics and no library or CDN will save you.
The four mechanical decisions
For every image you add to a page, four questions:
The rest of this article walks through each.
Format: pick AVIF first, WebP as fallback, JPEG/PNG only when needed
The format hierarchy for photos and complex images, in 2026:
AVIF is the right answer for most photos. The encoder is slower than JPEG, which matters at build time but not at delivery. The decoder is fast in every modern browser. The compression-quality curve beats WebP and JPEG noticeably; an AVIF at quality 50 looks better than a JPEG at quality 80 and is half the file size.
The standard approach is to serve AVIF when the browser supports it, fall back to WebP otherwise, and use JPEG as a last resort. The <picture> element does this declaratively:
The browser picks the first <source> it understands. Any modern browser gets AVIF; any browser from the last six years gets WebP; everything else gets JPEG. The <img> is the fallback and the source of alt, width, and height for all three.
For logos, icons, and any image with sharp edges or text, SVG. Vector graphics scale to any DPR for free, are cacheable, and are usually smaller than the equivalent rendered PNG at retina resolution. Inline simple SVG icons (under a kilobyte) directly in markup; reference larger SVGs as files for caching.
For screenshots, diagrams, or anything that needs precise pixels and an alpha channel, PNG is still the right call. AVIF handles transparency, but PNG's lossless compression with palette optimization is hard to beat for diagrams with few colors and large flat regions. Run pngquant or oxipng over them at build time and they shrink further.
GIF is dead for video; replace it with <video> and an MP4 or WebM. A 5-second GIF is often 3 MB; the same content as MP4 is 200 KB. The conversion is one ffmpeg command and the win is enormous.
Responsive: srcset and sizes
The second mechanical decision: do not ship a 4000-pixel-wide image to a phone with a 400-pixel-wide screen. The browser has to download the full file, decode it, and downscale it to display, which wastes bandwidth, CPU, and memory.
The srcset attribute lets you provide multiple sizes and a sizes attribute lets you tell the browser how wide the image will be at different viewport widths. The browser picks the smallest variant that is still big enough.
The srcset lists the available variants and their natural widths. The sizes describes how wide the image will be in the layout: at viewports under 1024 pixels it fills the viewport (100vw), at 1024 and above it caps at 1024 pixels.
Combined with the device's pixel ratio, the browser picks. A retina iPhone with a 390 CSS-pixel viewport multiplied by 3 DPR needs a 1170-pixel-wide source, so the browser fetches hero-1200.jpg. A desktop at 1440 viewport with 1 DPR fits hero-800.jpg for a layout that caps at 1024. A 4K display at 1024 layout width fetches hero-2400.jpg.
Generating the variants is a build-time step. Frameworks like Next.js (next/image), Astro's <Image>, Nuxt's <NuxtImg>, Remix's image utilities, and standalone tools like sharp all do this for you. If you are not using a framework that handles it, write a small script using sharp and check in the variants, or generate at request time via a CDN with image transforms.
The pixel-density alternative for fixed-display-size images (avatars, icons rendered at a fixed CSS size):
The browser picks 1x, 2x, or 3x based on the device DPR. For a 48x48 avatar this is simpler than the w variant; the avatar is always 48 CSS pixels regardless of viewport.
Lazy loading: opt out for the LCP image, opt in for everything else
Native lazy loading is built into the browser:
The browser delays loading the image until it is near the viewport. For pages with many images below the fold (a long blog post, a product grid), this saves the user from downloading images they may never scroll to.
The critical exception: do not lazy-load the LCP image. The LCP image is the largest visible element above the fold; lazy-loading it would defer its fetch behind layout, which makes the LCP slower, not faster. Leave loading="lazy" off the hero image. Better: add fetchpriority="high" to it, which tells the browser to prioritize that fetch above other resources.
The combination of fetchpriority="high" on the LCP image and loading="lazy" on the rest is one of the single highest-leverage performance changes I make on landing pages. The hero loads as fast as the network allows; the rest loads on demand.
A related pattern for the hero: <link rel="preload" as="image" href="hero.jpg" fetchpriority="high"> in the head. The preload tells the browser to start fetching the hero before the parser even reaches the <img> tag, shaving another hundred milliseconds off LCP on slow connections.
Reserved space: width and height attributes, always
Every <img> tag should have width and height attributes. Not CSS, attributes. The browser uses them to compute the aspect ratio and reserve space in the layout before the image loads, which prevents the layout shift when the image arrives.
The attributes do not lock the image to that pixel size; CSS can resize freely. They are aspect-ratio hints, not size declarations. The image still fills its container or matches width: 100% or whatever your CSS says. The point is that the browser knows the ratio before the bytes arrive.
For images where the dimensions are not known at write-time (user-uploaded photos, CMS content), use the aspect-ratio CSS property on a wrapper:
The wrapper reserves the 16:9 space; the image fills it once loaded. CLS stays at zero for that image even though we did not know its native dimensions at the time we wrote the HTML.
The CDN: where most teams should land
For a small site, hand-generating image variants and serving them from origin is fine. For anything larger, an image CDN is a real productivity and performance multiplier. The two flavors:
- Image CDN with on-the-fly transforms. Cloudinary, Imgix, Bunny.net's Stream, Cloudflare Images. You upload one source image; the CDN serves any size, format, and quality you ask for via URL parameters. Cache hit ratio is high, the transforms are cached, and you do not maintain build-time variant generation.
- Edge transforms in your framework. Next.js's
next/imagewith the default loader, Vercel's Image Optimization, Netlify's image CDN. The framework wraps the image and the platform handles transforms at the edge. Less control than a dedicated CDN, but zero configuration.
For most product teams, the framework's built-in handling is enough. For media-heavy sites (e-commerce, photography, social), a dedicated image CDN is worth the configuration.
A caveat: the image CDN handles delivery, not the source asset. A 5 MB raw photo uploaded with no compression at the origin still costs the CDN more storage and the first-fetch transform takes longer. Compress source images at upload time too; do not rely on the CDN to fix everything.
SVG and the fonts question
A short note on two adjacent topics that come up every time I talk to a team about image optimization.
Inline SVGs vs file SVGs. Inline (in the HTML) for icons under a kilobyte that change with the page (theme-aware colors, hover states, animations). External (referenced via <img src="icon.svg"> or <use xlink:href>) for larger SVGs that benefit from caching across pages. The inline version saves a request; the external version saves bytes on repeat views. Match the choice to the asset's size and reuse pattern.
Web fonts as image-shaped concerns. Web fonts behave like images for performance: they block text rendering, take real bandwidth, and their loading affects LCP. Use font-display: swap to render fallback text immediately. Subset fonts to only the characters you use (Latin-only is roughly a quarter the size of a full multilingual font). Use variable fonts (one file, many weights) instead of shipping six static weights. Self-host fonts when possible for predictable caching; the latency advantage of Google Fonts CDN was real in 2018 and has been mostly equalized by browser cache partitioning since 2020.
Where image optimization is heading
The modern stack already handles the basics well. The areas I expect to see continued improvement in over the next couple of years:
- Better default formats. AVIF adoption is now near-universal in browsers; the next codec (likely JPEG XL, if Chromium reverses course on the support it removed in Chrome 110, or another VP-family codec) might land another 20% reduction. The
<picture>fallback chain handles this gracefully when it does. - More framework-level integration. Frameworks adding built-in
loading="lazy",fetchpriority="high"on detected LCP images, and aspect-ratio-aware components by default. The trend is clear: the right things become defaults, and you opt out only when you have a reason. - Edge transforms becoming more capable. Cropping, focal-point detection, AI-aided composition, all moving to the edge. Cloudinary and Imgix have been there for years; the rest of the ecosystem is catching up.
For now, the four mechanical decisions are what matters. Pick the right format, ship the right size, lazy-load below the fold, reserve the space. Those four cover roughly 95% of the image-related performance win on a typical page. The 5% that remains is fine-tuning, and you do that after the basics are in place. Do them out of order and you optimize the long tail while the short tail is still on fire.
