What gets tested, what doesn't, the test pyramid that suits a vibe-coded codebase, and the rules that stop tests becoming the work itself. Optimised for indie / solo / small-team velocity, not enterprise certainty.

Testing strategy

A sourdough starter for what (and what NOT) to test. The AI's default is to generate exhaustive unit tests for everything; that's the wrong shape for indie / vibe-coded work. The pyramid below trades coverage for velocity in places where the trade is worth making.

This document is opinionated. Adjust the percentages to your team's risk tolerance, but make the call explicitly — defaulting to "test everything" is a decision you should make on purpose, not by accident.

The shape

       ┌─────┐
       │ e2e │   ~5%   — critical paths only
       └─────┘
      ┌───────┐
      │  int  │  ~25%  — anywhere a wrong contract bites
      └───────┘
    ┌───────────┐
    │   unit    │  ~70%  — pure functions, business logic
    └───────────┘

Counts to total roughly 100% is the wrong frame — these are proportions of effort, not absolute numbers. A repo with 50 unit tests, 18 integration tests, and 4 e2e tests is in the right shape if the underlying logic justifies those counts.

What to test (priority order)

1. Pure functions doing business logic

The highest-leverage tests in the codebase. Pure functions are:

Deterministic — same input, same output
Side-effect-free — don't touch a database, network, or filesystem
Easy to assert on — return a value you compare

Examples in a vibe-coded codebase that are worth unit-testing:

Pricing calculations (calculatePrice(items, discounts, tax))
Optimisation logic (optimise(wanted) → cheapest covering SKU set)
Parsers (parseEnvFile(text) → { vars, errors })
Validators (any schema validation that's project-specific)
Date / currency / unit conversions

Rule: if a function is pure and non-trivial, write a unit test. The cost is low (fast, no setup), the value is high (the regression caught here can't be caught later cheaply).

2. The integration boundary: API routes + database

The shape: hit the route, assert the database changed (or the response is right). One test per route per major code path (happy path + 1-2 error paths).

What this catches:

The contract drift between client and server
Auth checks being absent / wrong
Validation bugs at the boundary
Database constraints fighting your app logic

Tools: Vitest / Jest + a test database (Postgres in Docker, or Supabase local). Don't mock the database — let it run; it's fast enough.

Rule: every protected API route has at least one integration test that asserts "without auth, this returns 401."

3. End-to-end: the critical paths

What "critical path" means:

Money paths. Checkout, billing webhook reconciliation, refund.
Auth paths. Sign up, sign in, sign out, password reset.
The core promise. Whatever your product's main job is — for Fizzgig that's "AI editor calls a tool, gets a result."

Three to five e2e tests total for most indie projects. Each one is expensive to write and maintain, so the bar for adding one is high.

Tools: Playwright is the standard.

Rule: if it'd be embarrassing for this to break, e2e-test it. Otherwise integration tests cover the ground.

What NOT to test

The AI will offer to write these. Decline.

Trivial components (a <Button> that renders its children — there's nothing to test). The framework guarantees this works.
Configuration values (testing that const FOO = 5 returns 5 — you're testing the language, not the code).
Third-party library behaviour (the Stripe SDK works; you don't need a test that confirms it).
Snapshot tests of every component. Snapshots in this shape become noise — devs blindly update them. Use snapshots only for genuinely deterministic output (e.g. a markdown-to-HTML transformer's output for a known input).
Mock-heavy unit tests where the mocks are most of the test. If the test is "given these 12 mocks, the function calls them in this order," it's a tautology — it asserts the implementation, not behaviour.
Tests for code you're about to delete. Throwaway code gets throwaway treatment.

When to write the test (the timing question)

Three valid moments:

Before the code (TDD). Write the failing test, then make it pass. Best for pure-function logic where the inputs and outputs are clear.
Alongside the code. Most common — implement, test as you go.
After a bug. A failing test that reproduces the bug, then the fix that makes it pass. The regression test stays for life.

The fourth option — never — is valid for the "what NOT to test" list above.

Anti-pattern: writing tests because "we should have tests" without a specific failure mode in mind. Those tests don't catch real bugs; they catch the test author saying "this thing I just wrote behaves the way I just wrote it."

Test characteristics that matter

Fast. Unit tests under 10ms each, integration tests under 1s, e2e tests under 30s. Slow tests don't get run.
Deterministic. Flaky tests are worse than no tests — they erode trust in the suite. If a test flakes, fix it or delete it. Don't ignore it.
Independent. Each test can run in isolation, in any order, in parallel.
One concern per test. "It does X under condition Y." If a test has three expect() blocks each asserting unrelated things, split it.
Readable. The test name describes the behaviour, not the implementation. "returns 401 when no session cookie" beats "test_auth_middleware_branch_3".

CI gating

What blocks a merge?

Type check passes (tsc --noEmit). Always.
Unit + integration tests pass. Always.
Lint passes (if you have one). Configurable strictness.
E2e tests pass. On every PR, or on-demand — depends on how stable yours are. Flaky e2e tests blocking merges is worse than no e2e gate.

What doesn't block:

Coverage thresholds. Coverage is a guideline, not a gate; chasing a number incentivises writing the wrong tests.
Build performance regressions. Worth monitoring; not worth blocking.

How to feed this starter

Add to it when:

You introduce a new test category worth documenting (visual regression, contract tests, load tests).
A pattern proves valuable enough to recommend (e.g. "we test our Worker handlers against a real Miniflare instance instead of mocking — caught 3 bugs in the first month").
A pattern proves wasteful and you've retired it.

Remove from it when:

A rule turns out wrong. Better to delete than to have stale guidance.

Companion starters

project-starter — parent shape; the project-level test posture lives there too
security-posture — security-shaped tests (auth gates, validation parity) sit alongside the playbook there
adr — the testing-strategy decision is exactly an ADR shape if you want to lock it more formally