Three years ago I added type hints to a function on my first Python codebase and immediately filed a bug against myself when production blew up on bad input. The function was annotated def process(records: list[Record]), the caller passed in a list of dicts, the function ran, and four lines in it crashed when it tried to call record.name. My confusion: I had "typed" the function. Why did Python not check?
The answer is the most important fact about Python's type system, and the one most articles bury at the bottom: annotations are not enforced at runtime. Python parses them, stores them on the function object, and never looks at them again. The static checker (mypy, pyright, pyre) is a separate program that reads them and tells you about mismatches. The runtime treats def f(x: int) and def f(x) identically.
The argument I want to make is that once you internalise that single fact, the rest of Python's type system becomes useful instead of confusing. Hints are documentation that a tool checks. They make refactors safer, IDE autocomplete sharper, and code reviews faster. They do not make your runtime safer. The day you stop expecting them to is the day you start writing them well.
What annotations actually are
A Python annotation is a value attached to a function parameter, return type, or variable. The interpreter evaluates the annotation expression at function definition time (with PEP 563 / from __future__ import annotations, evaluation is deferred to a string form, but the principle is the same). The result is stashed in __annotations__.
That is the entire runtime story. The annotations are values in a dict. The interpreter does not check them when the function is called, does not check them when the function returns, and does not raise on mismatches. greet(123, "oops") runs, fails inside the function when it tries to multiply a string by a string, and produces a runtime error far away from the type mistake.
This is by design. Python's type system is structural, gradual, and external. Your code chooses how much of it to use; the runtime does not.
One wrinkle worth knowing: from __future__ import annotations (PEP 563) changes the storage representation. With that import, every annotation is stored as a string rather than the evaluated class, which lets you reference forward classes and skips evaluating expensive type expressions at import time. The trade-off is that __annotations__ hands you strings like 'list[Record]' instead of live classes. The canonical resolver at runtime is typing.get_type_hints(obj); pydantic, dataclasses, and attrs all rely on it (or inspect.get_annotations) internally.
What mypy and pyright add
mypy and pyright are static analysers. They read the code and the annotations, build a model of what types flow where, and report mismatches. Run them as a CI step (or in your editor as you type) and you catch type errors without running the code.
mypy never runs the code. It reads the source and reasons about it. The error catches the same problem the runtime would catch (eventually, in production), but at lint time, before anyone ships.
The choice of checker matters less than people make it sound. Pyright (used by Pylance in VS Code) is faster, has better incremental performance, and is generally a touch stricter on inference. mypy is the original, has wider library-stub coverage, and is the de-facto standard for CI. I use Pyright in editor for instant feedback and mypy in CI for the canonical pass; the overlap is most of the work.
A concrete moment this paid off: a teammate refactored a config helper to return Optional[str] instead of str, updated the obvious callers, and shipped a green PR. The next morning, mypy on main failed because one caller three modules over concatenated the result with another string. The path was gated behind a feature flag the failing test never enabled, so the runtime would have hit TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' only when a customer flipped the flag. Tests prove the paths you exercised work; mypy proves the type contracts hold across paths you forgot to exercise.
The shape of the type system, briefly
The core types are familiar.
Generics show up everywhere once you have a typed collection.
Protocols (PEP 544) are the structural-typing escape hatch when you want "any object that quacks like X" without an inheritance relationship.
This is where the type system starts to feel powerful: you can describe the shape of a function's contract without forcing every passing object into a class hierarchy. For Python, where duck typing is the norm, Protocols feel native; for engineers coming from Java, the structural-vs-nominal distinction is the headline.
The library that breaks the rule on purpose: pydantic
There is one important class of libraries that DOES use annotations at runtime: data-validation libraries like pydantic, attrs (with validators), and msgspec. These libraries read your class's annotations and use them to validate and coerce input data.
The trick is that pydantic does NOT depend on Python checking annotations. pydantic reads them itself, generates validation code, and runs that code on every constructor call. The annotations are still metadata as far as the interpreter is concerned; pydantic is just one of the consumers of that metadata.
This is a huge deal in practice. Anywhere external data crosses into your program (HTTP request bodies, queue messages, config files, database rows from a typed driver), pydantic gives you actual runtime validation, with errors that point at the field. The type system goes from documentation to a real boundary checker.
The limitation: validation only happens at the pydantic boundary. Once you have a User instance, accessing u.name does not re-check anything. The contract is enforced at construction, not at field access. This is the right trade-off; the alternative would be ruinous.
What I run in CI
For any Python project I take seriously now:
--strict turns on every check mypy knows about: untyped function definitions are errors, missing return annotations are errors, returning Any is an error, and so on. It is more annoying at first but uniformly worth it. The day-one pain is fixing the existing untyped code; the day-365 win is that nothing gets shipped without going through the type checker.
I also configure mypy to disallow specific patterns project-wide:
The override syntax is for libraries without type stubs. ignore_missing_imports says "trust me, this library has no types, do not error out about it". Use sparingly; it is a hole in the safety net.
The five hint patterns I write most
By frequency:
The rule of thumb: prefer abstract types on inputs (Sequence, Mapping, Iterable) so callers can pass any compatible structure, and prefer concrete types on outputs (list, dict) so callers know exactly what they get. This is the same advice as in any typed language; Python just lets you ignore it for years and find out later.
Where the type system genuinely cannot help
There is code where the static checker cannot reason about what is happening, and a pretend annotation is worse than admitting Any. Four categories I keep hitting:
- Metaprogramming. Anything that overrides
__getattr__or__getattribute__, or builds attributes from a registry at class-creation time. SQLAlchemy 1.x declarative classes were the canonical example (the 2.x rewrite added typed mappings to fix this); pydantic v1's dynamic field access fits the same pattern. mypy sees no statically-declared attributes and either flags every access or silently treats it asAny. - Heavy
**kwargsplumbing. A function that forwards arbitrary kwargs cannot be typed without knowing the inner signature.ParamSpec(PEP 612) helps for decorator wrapping, but for the common "accept, mutate, forward" pattern,**kwargs: Anyis the truthful annotation. - Plugin systems and late binding.
importlib.import_module(name)wherenamecomes from config, or pulling a class out of a runtime-populated registry, breaks static analysis. The checker has no way to know what type comes back. - Runtime-driven config schemas. A dict loaded from YAML is
dict[str, Any]to mypy, however regular its shape. The fix is not better hints; load it through a pydantic model at the boundary and pass the typed model around.
Type hints are static documentation, and static documentation cannot describe a system whose shape is decided at runtime. Knowing where the analyser stops helping is part of writing types well.
Where hints are not worth it
A few cases where I do not bother annotating, on purpose.
- Throwaway scripts. A 30-line ETL one-shot does not need types. The script is read once, debugged once, deleted once.
- Tests, in moderation. I annotate the public API of test helpers, not every fixture or assert helper. The test body is short enough to read.
- Heavy
**kwargsplumbing. If a function exists to forward arbitrary kwargs to another function, the annotations get ridiculous.**kwargs: Anyand a comment about the actual constraints is better than fighting the type system. - Legacy untyped libraries with no stubs. If the library has no annotations and no stubs, you can either write your own stubs (a real time investment) or
# type: ignore[no-any-return]the boundary calls. The latter is fine.
The thing I tell teams adopting types
The path that works is gradual. Turn on mypy with no strictness, get to zero errors, then ratchet up: disallow_untyped_defs, then --strict. Annotate new code as you write it; annotate old code when you touch it. Do not stop the world to type a 200,000-line codebase top-down. That project never finishes.
The payoff is real. Refactors stop being scary, IDE autocomplete becomes accurate, code reviews stop arguing about whether a function returns None or False, and the class of bugs where one caller passed a string and another passed a Path just disappears. None of that is the runtime safety many people hope hints will bring; that is on you, on pydantic, on whatever validates the boundary. Type hints are a tool for the developer reading the code, not a tool for the program executing it. Once that distinction sticks, the system is easy to like and easy to write well.
