Testing strategy
A sourdough starter for what (and what NOT) to test. The AI's default is to generate exhaustive unit tests for everything; that's the wrong shape for indie / vibe-coded work. The pyramid below trades coverage for velocity in places where the trade is worth making.
This document is opinionated. Adjust the percentages to your team's risk tolerance, but make the call explicitly — defaulting to "test everything" is a decision you should make on purpose, not by accident.
The shape
┌─────┐
│ e2e │ ~5% — critical paths only
└─────┘
┌───────┐
│ int │ ~25% — anywhere a wrong contract bites
└───────┘
┌───────────┐
│ unit │ ~70% — pure functions, business logic
└───────────┘
Counts to total roughly 100% is the wrong frame — these are proportions of effort, not absolute numbers. A repo with 50 unit tests, 18 integration tests, and 4 e2e tests is in the right shape if the underlying logic justifies those counts.
What to test (priority order)
1. Pure functions doing business logic
The highest-leverage tests in the codebase. Pure functions are:
- Deterministic — same input, same output
- Side-effect-free — don't touch a database, network, or filesystem
- Easy to assert on — return a value you compare
Examples in a vibe-coded codebase that are worth unit-testing:
- Pricing calculations (
calculatePrice(items, discounts, tax)) - Optimisation logic (
optimise(wanted) → cheapest covering SKU set) - Parsers (
parseEnvFile(text) → { vars, errors }) - Validators (any schema validation that's project-specific)
- Date / currency / unit conversions
Rule: if a function is pure and non-trivial, write a unit test. The cost is low (fast, no setup), the value is high (the regression caught here can't be caught later cheaply).
2. The integration boundary: API routes + database
The shape: hit the route, assert the database changed (or the response is right). One test per route per major code path (happy path + 1-2 error paths).
What this catches:
- The contract drift between client and server
- Auth checks being absent / wrong
- Validation bugs at the boundary
- Database constraints fighting your app logic
Tools: Vitest / Jest + a test database (Postgres in Docker, or Supabase local). Don't mock the database — let it run; it's fast enough.
Rule: every protected API route has at least one integration test that asserts "without auth, this returns 401."
3. End-to-end: the critical paths
What "critical path" means:
- Money paths. Checkout, billing webhook reconciliation, refund.
- Auth paths. Sign up, sign in, sign out, password reset.
- The core promise. Whatever your product's main job is — for Fizzgig that's "AI editor calls a tool, gets a result."
Three to five e2e tests total for most indie projects. Each one is expensive to write and maintain, so the bar for adding one is high.
Tools: Playwright is the standard.
Rule: if it'd be embarrassing for this to break, e2e-test it. Otherwise integration tests cover the ground.
What NOT to test
The AI will offer to write these. Decline.
- Trivial components (a
<Button>that renders itschildren— there's nothing to test). The framework guarantees this works. - Configuration values (testing that
const FOO = 5returns 5 — you're testing the language, not the code). - Third-party library behaviour (the Stripe SDK works; you don't need a test that confirms it).
- Snapshot tests of every component. Snapshots in this shape become noise — devs blindly update them. Use snapshots only for genuinely deterministic output (e.g. a markdown-to-HTML transformer's output for a known input).
- Mock-heavy unit tests where the mocks are most of the test. If the test is "given these 12 mocks, the function calls them in this order," it's a tautology — it asserts the implementation, not behaviour.
- Tests for code you're about to delete. Throwaway code gets throwaway treatment.
When to write the test (the timing question)
Three valid moments:
- Before the code (TDD). Write the failing test, then make it pass. Best for pure-function logic where the inputs and outputs are clear.
- Alongside the code. Most common — implement, test as you go.
- After a bug. A failing test that reproduces the bug, then the fix that makes it pass. The regression test stays for life.
The fourth option — never — is valid for the "what NOT to test" list above.
Anti-pattern: writing tests because "we should have tests" without a specific failure mode in mind. Those tests don't catch real bugs; they catch the test author saying "this thing I just wrote behaves the way I just wrote it."
Test characteristics that matter
- Fast. Unit tests under 10ms each, integration tests under 1s, e2e tests under 30s. Slow tests don't get run.
- Deterministic. Flaky tests are worse than no tests — they erode trust in the suite. If a test flakes, fix it or delete it. Don't ignore it.
- Independent. Each test can run in isolation, in any order, in parallel.
- One concern per test. "It does X under condition Y." If a test has three
expect()blocks each asserting unrelated things, split it. - Readable. The test name describes the behaviour, not the implementation. "returns 401 when no session cookie" beats "test_auth_middleware_branch_3".
CI gating
What blocks a merge?
- Type check passes (
tsc --noEmit). Always. - Unit + integration tests pass. Always.
- Lint passes (if you have one). Configurable strictness.
- E2e tests pass. On every PR, or on-demand — depends on how stable yours are. Flaky e2e tests blocking merges is worse than no e2e gate.
What doesn't block:
- Coverage thresholds. Coverage is a guideline, not a gate; chasing a number incentivises writing the wrong tests.
- Build performance regressions. Worth monitoring; not worth blocking.
How to feed this starter
Add to it when:
- You introduce a new test category worth documenting (visual regression, contract tests, load tests).
- A pattern proves valuable enough to recommend (e.g. "we test our Worker handlers against a real Miniflare instance instead of mocking — caught 3 bugs in the first month").
- A pattern proves wasteful and you've retired it.
Remove from it when:
- A rule turns out wrong. Better to delete than to have stale guidance.
Companion starters
- project-starter — parent shape; the project-level test posture lives there too
- security-posture — security-shaped tests (auth gates, validation parity) sit alongside the playbook there
- adr — the testing-strategy decision is exactly an ADR shape if you want to lock it more formally