Community Article

AWS Lambda Cold Starts: What Actually Helps

Where the cold-start time really comes from, the four levers that have moved my p99 down by hundreds of milliseconds, and the optimizations I have tried and abandoned because they did not pay back.

AWS Lambda Cold Starts: What Actually Helps

Where the cold-start time really comes from, the four levers that have moved my p99 down by hundreds of milliseconds, and the optimizations I have tried and abandoned because they did not pay back.

serverless
performance
backend
reliability
liamsuzuki

By @liamsuzuki

December 23, 2025

·

Updated May 18, 2026

596 views

18

4.3 (10)

Cold starts are the single most discussed topic about Lambda and the most misunderstood. Half the advice on the internet is from 2019 and assumes the JVM. Half of the rest is for a workload that is not yours. After running production Lambda services across Node, Python, and Go for several years, I have a much shorter list of things that actually move the number than the conventional wisdom suggests, and a longer list of optimizations I tried and abandoned because they did not pay back.

This is the writeup I wish someone had handed me on day one of the migration. Where the cold-start time actually comes from, the four levers that have made the most difference in the services I have run, and a few popular tricks I no longer use.

Where the cold-start time actually goes

A cold start is what happens when AWS has to spin up a new execution environment to serve your function: there is no warm container available, so the platform creates one. The total user-visible latency for that first request decomposes into roughly four phases. The exact numbers depend on your runtime, package size, and what you do at init, but the shape is consistent.

A cold-start budget I have measured (rough order of magnitude)
  1. Container provisioning           50 to 200 ms   (AWS, you cannot tune)
  2. Code download + extract           50 to 500 ms   (depends on package size)
  3. Runtime + dependency init        100 to 2000 ms  (this is your code's biggest lever)
  4. Handler first invocation          1 to 50 ms     (your handler body)

The shape that surprises people: phase 3 (runtime and dependency init) is almost always the dominant cost. The thing you actually want to optimize is what your code does between cold-start container readiness and the handler being ready to run. Phase 1 is fixed, phase 2 only matters if your zip is large, and phase 4 is rarely the bottleneck.

A concrete example from a Node Lambda I worked on. The function imported the AWS SDK, the Stripe SDK, a Postgres client, an ORM, and a logger. Each of those did some module-load work (require chains, polyfill installation, schema parsing). The cumulative cost at first import was around 1.4 seconds on a 256 MB Lambda. The handler itself was a 40 ms DB call. The user saw 1500 ms on a cold start and 60 ms on a warm one. Lever-by-lever, we got that 1400 ms down to about 350 ms without touching the handler.

The four levers that actually moved the number

Lever 1: smaller deploy package

My starter heuristic is that anything under 10 MB has negligible code-download cost. Anything over 50 MB starts to hurt. Anything over 250 MB is a problem and you should be looking at container images instead.

What actually balloons a Node Lambda:

  • The full aws-sdk v2 (around 60 MB unminified). Switching to the v3 SDK and importing only the clients I use cut my package by 50 MB. This was the single biggest change.
  • A bundled Prisma client with all engines for all platforms. Pinning the engine to linux-musl and shipping only that took another 30 MB off.
  • Source maps and TypeScript source files committed to the deploy artifact. The .map and .ts files are dead weight in production.

A practical bundle command for an esbuild-based Lambda:

esbuild src/handler.ts \
  --bundle \
  --platform=node \
  --target=node22 \
  --minify \
  --external:@aws-sdk/* \
  --external:aws-sdk \
  --outfile=dist/handler.js

The --external flags exclude the SDK from the bundle (the runtime provides it), which alone takes about 20 MB off a typical handler.

Lever 2: defer expensive imports until you need them

Node loads imports eagerly at module scope. If my handler talks to Stripe in 5% of invocations and to S3 in the other 95%, importing Stripe at module top is paying init cost on every cold start for nothing.

// before: pays the import cost on every cold start, even if Stripe is unused
import Stripe from 'stripe';
const stripe = new Stripe(process.env.STRIPE_KEY!);

export async function handler(event: Event) {
  if (event.path === '/checkout') {
    return await stripe.charges.create(...);
  }
  return await handleSomethingElse(event);
}
// after: pays the cost only on the path that needs it, then caches
let stripe: Stripe | null = null;
function getStripe() {
  if (!stripe) {
    const Stripe = require('stripe');
    stripe = new Stripe(process.env.STRIPE_KEY!);
  }
  return stripe;
}

export async function handler(event: Event) {
  if (event.path === '/checkout') {
    return await getStripe().charges.create(...);
  }
  return await handleSomethingElse(event);
}

The lazy version pays the Stripe import cost only on requests that actually hit the checkout path, and the cached stripe value means warm invocations on that path skip the cost entirely. On the same Lambda above, deferring three optional dependencies took about 600 ms off cold starts for the 80% of paths that did not use them.

Lever 3: more memory (which buys you more CPU)

Lambda allocates CPU proportionally to memory. A 128 MB function gets a fraction of a vCPU; a 1769 MB function gets one full vCPU; above that, you get multiple vCPUs. Init code is mostly synchronous and CPU-bound (parsing, compiling, building dependency graphs), so giving the function more memory makes init faster.

The counterintuitive consequence: on cold-start-sensitive workloads, going from 256 MB to 1024 MB often makes the function cheaper, not more expensive, because (a) cold starts are shorter and (b) warm invocations finish faster, and you pay per ms of execution. I run almost everything at 1024 MB or above unless I have measured otherwise. The starter heuristic I would suggest is: bump memory until cold-start time stops decreasing, then bump warm-invocation duration, then settle at whichever is the cheaper combined cost. AWS publishes a tool, the Lambda Power Tuning state machine, that automates this sweep; running it on a representative event payload takes about 10 minutes and is the closest thing to a free win in this whole article.

Lever 4: keep the runtime simple

The runtime you pick has a built-in floor. From measurements I have run on equivalent functions:

Approximate cold-start floor by runtime, on a 1 GB Lambda with a small handler
  Go (provided.al2 + native binary)    ~150 ms
  Node 22.x                            ~250 ms
  Python 3.13                          ~280 ms
  Java (corretto-21)                   ~600 ms before SnapStart, ~200 ms with SnapStart
  .NET (8.0)                           ~700 ms before optimization

These are floors with no user code at all. Once you start adding heavyweight frameworks (Spring, .NET DI containers, NestJS with reflection-based metadata), Java and .NET balloon. Go and small Node functions are kindest to cold starts; in either, a sub-300ms cold start is realistic if the rest of this article's advice is followed.

If you are on Java specifically, SnapStart (the snapshot-based init that caches initialized state) cut a Spring Boot Lambda I worked on from 4 seconds to 250 ms cold-start, with no code changes. It is the most impactful single feature AWS has shipped for cold starts on the JVM.

The optimizations I have tried and abandoned

A short list of things I tested, measured, and decided were not worth the complexity for the workloads I run:

Provisioned concurrency for everything. Provisioned concurrency keeps N execution environments warm and pre-initialized so cold starts disappear. It works. It also costs about 10 to 15x the per-invocation cost of on-demand, billed continuously. Use it for one or two latency-critical, low-volume endpoints (a checkout API, a user-facing auth callback). Do not blanket-apply it. I have seen bills triple from a well-meaning provisioned-concurrency rollout that was not load-tested.

Lambda layers for shared dependencies. The pitch is that putting common dependencies in a layer keeps the deployment package small. The reality is that layers are still part of the cold-start download, the size limit is the same (250 MB unzipped), and managing layer versions across functions adds a deployment dimension that is not free. I no longer use layers; bundling per-function with esbuild has been simpler and at least as fast.

Custom runtimes. Building a custom runtime (provided.al2 with a native handler) gets you a 50 to 100 ms cold-start improvement over the managed Node runtime. For me, the gain has not been worth losing the managed runtime updates. Exception: Go, which compiles to a single native binary and does run on provided.al2.al2023 natively; that is a great fit and the only custom-runtime case I still recommend without hesitation.

Pinging the function on a CloudWatch schedule to keep it warm. This was 2018 advice and it is mostly obsolete. Lambda's auto-scaling already keeps environments warm under steady load; the keepalive ping costs you a real invocation every minute, runs your code, and only marginally helps an environment that was about to be reaped. For low-volume functions, just accept the cold start or use provisioned concurrency.

A measurement-first checklist

The biggest mistake I see (and have made) is optimizing without measuring. The cold-start contribution to your latency budget can be the dominant factor or completely negligible depending on traffic shape. Before spending engineering time on it, I check three things:

  1. What fraction of my requests are cold starts? Print a cold_start: true|false field on every log line (track it via a module-scope flag flipped on first invocation). Compute the cold-start fraction over a week. If it is under 1%, the user-visible impact is small and you should be optimizing warm-path latency.
  2. What is my p50 vs p99 latency? If p50 is fine but p99 is bad, cold starts are likely the cause. If both are bad, fix the warm path first.
  3. Where in the cold start does my time actually go? AWS exposes init duration as a log field; subtract it from total duration to separate init time from handler time. Most of the time you will find init time is the bigger problem.

Only after those three numbers are in front of me do I start applying levers. Without them, I have shipped optimizations that did nothing measurable.

The advice that survives revision

I have rewritten my Lambda cold-start playbook three times in the last few years. Almost everything specific has been wrong on at least one platform: SnapStart appeared and rewrote the JVM section, the AWS SDK v2 to v3 transition rewrote the Node section, ARM Graviton (arm64) execution rewrote the price-performance section by giving me roughly 20% lower cost and 15% faster cold starts on the workloads I tested. The four levers in this article have survived all of those revisions. Smaller package, lazy imports, more memory, simple runtime. Everything else has either been a special case or an optimization that did not survive contact with the next platform release. If you do those four things and instrument the result, you will spend less time worrying about cold starts than you spent reading this article.

Back to Articles