Tests are code that runs on every push. They earn their keep when they fail meaningfully on real regressions and pass quietly otherwise.
class FixedClock:
def __init__(self, now: datetime) -> None:
self._now = now
def now(self) -> datetime:
return self._now
@pytest.fixture
def clock() -> FixedClock:
return FixedClock(datetime(2025, 1, 1, tzinfo=timezone.utc))
@pytest.fixture
def repo() -> FakeUserRepository:
return FakeUserRepository()
def test_register_persists_user_and_stamps_created_at_when_email_is_new(
repo: FakeUserRepository, clock: FixedClock
) -> None:
# arrange
service = RegistrationService(repo, clock)
# act
user = service.register(email="t@example.com")
# assert
assert repo.find(user.id) == user
assert user.created_at == datetime(2025, 1, 1, tzinfo=timezone.utc)The test name reads as an English sentence (11.2); arrange/act/assert sections are blank-line-separated with one concern (11.3); clock and repo are deliberately scoped fixtures, fresh per test (11.5, 11.6); FakeUserRepository is a hand-rolled domain fake while time arrives through an injected FixedClock rather than datetime.now() (11.8, 11.9); and the paired assertions check both persistence and the stamped timestamp (11.13).
Reasoning, step by step:
pytestis the de facto standard. It supports plainassert, fixtures, parameterization, plugins, and minimal ceremony.unittest-styleTestCaseclasses work in pytest but lose half the benefits — fixtures, parametrize, plain assertions.- Write tests as top-level functions. Group with
class Test*:only when fixtures are class-scoped and shared. - Migrate existing
unittestcode when you touch it.
Enforcement: review; CI runs pytest, and a flake8/ruff check flags unittest.TestCase subclasses in tests/.
Reasoning, step by step:
def test_returns_404_when_user_does_not_exist(): ...reads as the failure message you'll see at 2 am.def test_user_404():reads as a code.- Pattern:
test_<action>_<expected outcome>_when_<condition>. Variations are fine; reading aloud as English is not optional. - Test files:
tests/test_*.py. pytest discovers automatically.
Enforcement: review; pytest's python_files = test_*.py discovery enforces the file naming.
Reasoning, step by step:
- Three sections: set up the world, run the operation, check the outcome. Blank lines separate them.
- One concern per test. If a test asserts five unrelated facts, splitting improves failure messages.
- Acceptable: multiple assertions that all verify the same concern.
- Anti-pattern: god-tests that exercise an entire feature in one function. They fail in unhelpful ways and resist refactoring.
Enforcement: review; oversized test bodies flagged by a function-length lint.
Reasoning, step by step:
- Don't copy-paste tests with different inputs. Parametrize.
- Pattern:
@pytest.mark.parametrize( "raw, expected", [ ("2025-01-01", date(2025, 1, 1)), ("2025-12-31", date(2025, 12, 31)), ], ids=["new-year", "year-end"], ) def test_parse_iso_date(raw: str, expected: date) -> None: assert parse_iso_date(raw) == expected
- Each row is a separate test with its own failure context. The
ids=parameter labels failures readably. - For complex inputs, use a
@dataclassparameter value rather than a tuple. Readable failure output.
Enforcement: review; copy-pasted near-identical test bodies rejected in favor of a parametrized table.
Reasoning, step by step:
setUp/tearDownrun for every test, even when the setup is shared. Wasteful and noisy.- pytest fixtures: scope per-function (default), per-class, per-module, per-session. Pick by what's safe to share.
- Pattern:
@pytest.fixture def user() -> User: return User(id=UserId("u-1"), email="t@example.com", active=True)
- Yield-fixtures for cleanup:
@pytest.fixture def temp_dir() -> Iterator[Path]: d = Path(tempfile.mkdtemp()) yield d shutil.rmtree(d)
- Share fixtures across files via
conftest.pyat the test directory's root.
Enforcement: review; setUp/tearDown methods flagged alongside the unittest.TestCase lint (11.1).
Reasoning, step by step:
- Tests must be runnable in any order, alone or in groups, on any machine, in parallel.
- Shared mutable state (module-level
_cache: dict = {}) creates cross-test coupling. One test's failure masks another's success. - Each test gets fresh fixtures. Session-scoped fixtures are for immutable shared data only (a parsed schema, a fixed clock).
- Parallel-by-default: assume
pytest -n auto(withpytest-xdist) will run. Anything assuming serial order is a future flake.
Enforcement: CI runs pytest -n auto and a randomized-order plugin (pytest-randomly); order-dependent tests fail there.
Reasoning, step by step:
- The best test uses the real code with a real input. If real code does I/O, fake the I/O — don't mock the layer above.
- Mocking everything makes the test re-state the implementation. The test passes when the implementation is wrong in the way you mocked.
- The genuine seam is the external boundary: HTTP, database, clock, filesystem.
- Tools:
unittest.mock.Mock/MagicMockfrom stdlib;pytest-mockfor themockerfixture; hand-rolled Protocol-conforming fakes are often clearest.
Enforcement: review; mocks reaching past the external boundary into domain code are rejected.
Reasoning, step by step:
- A
FakeUserRepositorythat stores users in a dict is clearer than aMagicMockconfigured to return canned values. - Mocks shine when the dependency is hard to fake (an HTTP client, a database connection) — let the mocking framework do the call-recording.
- Domain-shaped fakes can grow to be tiny implementations: in-memory, deterministic, fully behavioral.
- Pattern:
class FakeUserRepository: def __init__(self) -> None: self._by_id: dict[UserId, User] = {} def save(self, user: User) -> None: self._by_id[user.id] = user def find(self, user_id: UserId) -> User | None: return self._by_id.get(user_id)
Enforcement: review; a MagicMock standing in for a domain repository or value object is rejected in favor of a fake.
Reasoning, step by step:
datetime.now()inside production code is untestable. Inject aClockProtocol.- Same for
random.random(),uuid.uuid4(), any source of non-determinism. - Tests use a deterministic clock and seeded random. The fixture provides them.
- The injection cost is one constructor parameter. The testability payoff is permanent.
Enforcement: review; a lint flags datetime.now, uuid.uuid4, and random.* reached directly inside production modules.
Reasoning, step by step:
hypothesisgenerates inputs and checks invariants. It finds edge cases your manual tests miss.- Good properties: round-trip (
decode(encode(x)) == x), idempotence (f(f(x)) == f(x)), monotonicity (a <= b ⇒ f(a) <= f(b)). - Don't use it as a clever way to run the same test against random nonsense.
- Shrinking is the killer feature: failed tests shrink to a minimal failing case.
hypothesisshrinks well.
Enforcement: review; round-trip, idempotence, and monotonicity invariants expected to carry a hypothesis property test.
Reasoning, step by step:
pytest-asynciolets you writeasync def test_*. Configure once inpyproject.toml:[tool.pytest.ini_options] asyncio_mode = "auto"
- Alternative:
anyio's test plugin if you target multiple async libraries. - For time-based async tests, use
freezegunor a fake clock — don'tawait asyncio.sleep(1)in tests.
Enforcement: asyncio_mode = "auto" set in pyproject.toml; review confirms async tests use the plugin, not ad-hoc event-loop wiring.
Reasoning, step by step:
time.sleep(1)is a flake waiting to happen — too short on slow CI, too long for fast tests.- For async timing: fake the clock or use
anyio.move_on_after. - For waiting on a condition: poll with a timeout.
- A test that needs to sleep "to give the thread time to start" has the wrong synchronization model.
Enforcement: review; a lint flags time.sleep under tests/.
Reasoning, step by step:
- A test with one assertion checks one fact. Often that's right.
- Complex outcomes assert both the positive (what happened) and the negative (what didn't). "User was created" implies
user.id is not Noneandrepo.count == 1andaudit.called_once. - Pair-assertion: verify the same property two ways. After
sort(xs), assert (a)len(xs)unchanged and (b) ordering invariant holds. - Don't pile up unrelated assertions to hit a number.
Enforcement: review; single-assertion tests of complex outcomes questioned for a missing negative or paired check.
11.14 — Coverage is a floor, not a ceiling. 100% coverage with 0% meaning is worse than 70% with deliberate tests.
Reasoning, step by step:
- Code coverage measures lines executed. It doesn't measure assertion quality.
- Set a minimum (often 80%) and fail builds below. Don't gloat about 100%.
- Focus tests on: behavior at boundaries, edge cases (empty, max, off-by-one), error paths, regressions.
- Mutation testing (
mutmut,cosmic-ray) is the next level — but expensive. Use selectively.
Enforcement: CI fails the build below the configured --cov-fail-under threshold.
- Protocols for testable seams: chapter 03, chapter 06.
- Async testing primitives: chapter 09.
- Test fixtures and the testability of constructor injection: chapter 10.