// starters/data/data-modelling

data modelling

Naming, types, money, identifiers, soft-delete, denormalisation — the schema-shaping decisions that get expensive to reverse. The defaults the AI should treat as constraints.

last fed 28 may 2026962 words · 5 min read
// when to use this starter

Fork this at the start of any project with a database. Run through it again any time you're about to add a table that introduces a new pattern (first money table, first auditable table, first user-content table). The decisions below shape every later schema choice.

dataschemapostgressupabasenamingmoneyids

Data modelling

A sourdough starter for the data-shape decisions that bind everything downstream. The AI's default behaviour is to invent per-table patterns; this document is what makes them consistent.


Naming

Thing Convention Example
Tables snake_case, plural users, tool_runs, audit_log_entries
Columns snake_case created_at, email_verified_at
Primary keys id (UUID by default) id uuid primary key default gen_random_uuid()
Foreign keys <singular_referenced>_id user_id, project_id
Booleans positive verb, no negation is_active (not is_disabled), is_admin (not not_admin)
Timestamps <event>_at, always with timezone created_at timestamptz, published_at timestamptz
Enums snake_case values `status: 'pending'
Indexes <table>_<columns>_idx tool_runs_user_id_created_at_idx

Plural vs singular for table names is the only debate worth having (some teams prefer singular: user, tool_run). Pick one, lock it. Mixing is the bug.


Money

Rule: integers, base unit (pence / cents), never floats.

-- Right
price_pence integer not null default 0
-- Wrong (floating point arithmetic is lossy on money)
price_pounds numeric(10, 2)

Why: float arithmetic introduces rounding errors that compound. 0.1 + 0.2 ≠ 0.3 in IEEE 754 floats. Money in floats is a known bug class.

At the display boundary: convert pence → display. Never store the display value.

function formatPrice(pence: number): string {
  return `£${(pence / 100).toFixed(2)}`;
}

Currency: if you're multi-currency, store amount_minor (integer) + currency (ISO 4217 code), and never mix. Conversion happens at presentation time, with FX rates frozen at transaction time.


Identifiers

  • UUIDs by default (uuid type, gen_random_uuid() default). Don't expose sequential integers as primary keys to the public — they leak order-of-creation, count, and let attackers enumerate.
  • External IDs (e.g. Stripe customer ID, GitHub user ID) stored as columns alongside, never as the primary key. The external system can churn; your primary key shouldn't.
  • Slug columns for user-facing URL identifiers. Indexed unique; lowercase, kebab-case; immutable once published (or with explicit redirect handling on change).
create table tools (
  id uuid primary key default gen_random_uuid(),
  slug text not null unique,
  stripe_product_id text unique,
  created_at timestamptz not null default now()
);

Timestamps

Every table has created_at. Most tables have updated_at. Both timestamptz (not timestamp).

create table whatever (
  -- ...
  created_at timestamptz not null default now(),
  updated_at timestamptz not null default now()
);

-- Trigger to maintain updated_at:
create trigger whatever_updated_at
  before update on whatever
  for each row execute function moddatetime('updated_at');

Deletion: soft-delete with deleted_at timestamptz instead of hard DELETE for any table where the row matters historically (orders, audit logs, user-generated content). Hard-delete for ephemeral tables (sessions, idempotency keys, cache).


RLS posture (Supabase / Postgres)

This belongs in security-posture primarily, but it shapes every table you write.

  • RLS enabled by default on every table. Even tables not yet user-facing.
  • Scope by auth.uid() for user-owned rows. auth.role() = 'authenticated' is NOT scoping — it just checks the user is logged in.
  • WITH CHECK on INSERT + UPDATE. Without it, a row can be reassigned to another user (the row-reassignment bug).
  • Tables without an obvious owner (lookups, public catalogues) get using (true) for reads — explicit, not by omission.

rls_checker covers 8 misconfiguration patterns.


Denormalisation

Default: don't. Normal forms exist for a reason.

When to break the rule:

  • Computed fields used in 90%+ of reads (e.g. comment_count on a posts table). Maintain via triggers, document the staleness window.
  • Pre-joined data for performance (e.g. embedding the author's display name on a posts row to avoid the JOIN). Worth it when read latency is the bottleneck and the duplicated field is rarely changed.
  • Audit-trail tables that capture state-at-time (e.g. order_line_items.product_name_at_purchase — the product can change, the historical purchase shouldn't).

If you do denormalise, document why in a comment on the table (or in an ADR). The next session needs to know it's intentional.


JSON columns

jsonb (not json) when you use them. They're useful for:

  • Per-row variable shape (e.g. event payloads, tool outputs, config blobs).
  • Append-only logs where the shape evolves over time.

They're not useful for:

  • Anything you'll query/filter regularly. Use real columns; you get indexes and type safety.
  • Anything joined to other tables. Foreign keys can't reference JSON.

Rule: if you query a JSON field with a WHERE clause more than once, promote it to its own column.


Migrations

  • Numbered, sequential, never reused. 001_initial_schema.sql, 002_add_user_avatar.sql. Same convention as ADRs.
  • One concern per migration. Don't bundle "add column + rename column + drop table" into one file.
  • Reversible where possible. Down-migrations are good hygiene; some changes are one-way (dropping columns with data, splitting tables) and that's fine — just be explicit.
  • Idempotent where possible. create table if not exists, add column if not exists. Makes re-running a migration safe.
  • No data migrations mixed with schema migrations. Schema goes in supabase/migrations/. Data backfills go in scripts (scripts/backfill-X.ts) that you run once, version, and commit.

How to feed this starter

Add to it when:

  • You introduce a new pattern (first auditable table, first JSON-heavy table, first FX-converted money table). Document the choice.
  • You hit a data-modelling bug worth recording. ("We used text for currency codes, then someone stored 'GBP ' with trailing space — switched to a CHECK constraint.")

Remove from it when:

  • A convention turns out wrong. Better to delete than to have a dead rule.

Companion starters