Data modelling
A sourdough starter for the data-shape decisions that bind everything downstream. The AI's default behaviour is to invent per-table patterns; this document is what makes them consistent.
Naming
| Thing | Convention | Example |
|---|---|---|
| Tables | snake_case, plural | users, tool_runs, audit_log_entries |
| Columns | snake_case | created_at, email_verified_at |
| Primary keys | id (UUID by default) |
id uuid primary key default gen_random_uuid() |
| Foreign keys | <singular_referenced>_id |
user_id, project_id |
| Booleans | positive verb, no negation | is_active (not is_disabled), is_admin (not not_admin) |
| Timestamps | <event>_at, always with timezone |
created_at timestamptz, published_at timestamptz |
| Enums | snake_case values | `status: 'pending' |
| Indexes | <table>_<columns>_idx |
tool_runs_user_id_created_at_idx |
Plural vs singular for table names is the only debate worth having (some teams prefer singular: user, tool_run). Pick one, lock it. Mixing is the bug.
Money
Rule: integers, base unit (pence / cents), never floats.
-- Right
price_pence integer not null default 0
-- Wrong (floating point arithmetic is lossy on money)
price_pounds numeric(10, 2)
Why: float arithmetic introduces rounding errors that compound. 0.1 + 0.2 ≠ 0.3 in IEEE 754 floats. Money in floats is a known bug class.
At the display boundary: convert pence → display. Never store the display value.
function formatPrice(pence: number): string {
return `£${(pence / 100).toFixed(2)}`;
}
Currency: if you're multi-currency, store amount_minor (integer) + currency (ISO 4217 code), and never mix. Conversion happens at presentation time, with FX rates frozen at transaction time.
Identifiers
- UUIDs by default (
uuidtype,gen_random_uuid()default). Don't expose sequential integers as primary keys to the public — they leak order-of-creation, count, and let attackers enumerate. - External IDs (e.g. Stripe customer ID, GitHub user ID) stored as columns alongside, never as the primary key. The external system can churn; your primary key shouldn't.
- Slug columns for user-facing URL identifiers. Indexed unique; lowercase, kebab-case; immutable once published (or with explicit redirect handling on change).
create table tools (
id uuid primary key default gen_random_uuid(),
slug text not null unique,
stripe_product_id text unique,
created_at timestamptz not null default now()
);
Timestamps
Every table has created_at. Most tables have updated_at. Both timestamptz (not timestamp).
create table whatever (
-- ...
created_at timestamptz not null default now(),
updated_at timestamptz not null default now()
);
-- Trigger to maintain updated_at:
create trigger whatever_updated_at
before update on whatever
for each row execute function moddatetime('updated_at');
Deletion: soft-delete with deleted_at timestamptz instead of hard DELETE for any table where the row matters historically (orders, audit logs, user-generated content). Hard-delete for ephemeral tables (sessions, idempotency keys, cache).
RLS posture (Supabase / Postgres)
This belongs in security-posture primarily, but it shapes every table you write.
- RLS enabled by default on every table. Even tables not yet user-facing.
- Scope by
auth.uid()for user-owned rows.auth.role() = 'authenticated'is NOT scoping — it just checks the user is logged in. WITH CHECKon INSERT + UPDATE. Without it, a row can be reassigned to another user (the row-reassignment bug).- Tables without an obvious owner (lookups, public catalogues) get
using (true)for reads — explicit, not by omission.
rls_checker covers 8 misconfiguration patterns.
Denormalisation
Default: don't. Normal forms exist for a reason.
When to break the rule:
- Computed fields used in 90%+ of reads (e.g.
comment_counton apoststable). Maintain via triggers, document the staleness window. - Pre-joined data for performance (e.g. embedding the author's display name on a
postsrow to avoid the JOIN). Worth it when read latency is the bottleneck and the duplicated field is rarely changed. - Audit-trail tables that capture state-at-time (e.g.
order_line_items.product_name_at_purchase— the product can change, the historical purchase shouldn't).
If you do denormalise, document why in a comment on the table (or in an ADR). The next session needs to know it's intentional.
JSON columns
jsonb (not json) when you use them. They're useful for:
- Per-row variable shape (e.g. event payloads, tool outputs, config blobs).
- Append-only logs where the shape evolves over time.
They're not useful for:
- Anything you'll query/filter regularly. Use real columns; you get indexes and type safety.
- Anything joined to other tables. Foreign keys can't reference JSON.
Rule: if you query a JSON field with a WHERE clause more than once, promote it to its own column.
Migrations
- Numbered, sequential, never reused.
001_initial_schema.sql,002_add_user_avatar.sql. Same convention as ADRs. - One concern per migration. Don't bundle "add column + rename column + drop table" into one file.
- Reversible where possible. Down-migrations are good hygiene; some changes are one-way (dropping columns with data, splitting tables) and that's fine — just be explicit.
- Idempotent where possible.
create table if not exists,add column if not exists. Makes re-running a migration safe. - No data migrations mixed with schema migrations. Schema goes in
supabase/migrations/. Data backfills go in scripts (scripts/backfill-X.ts) that you run once, version, and commit.
How to feed this starter
Add to it when:
- You introduce a new pattern (first auditable table, first JSON-heavy table, first FX-converted money table). Document the choice.
- You hit a data-modelling bug worth recording. ("We used
textfor currency codes, then someone stored'GBP 'with trailing space — switched to a CHECK constraint.")
Remove from it when:
- A convention turns out wrong. Better to delete than to have a dead rule.
Companion starters
- security-posture — the RLS layer that sits on every table
- adr — schema-shaping decisions are exactly what ADRs are for
- project-starter — parent shape