Give the LLM a constrained surface to author programs in.
Not free-form Go.

pithkit is an experimental compiler. The LLM emits a small, typed JSON program — entities, single-responsibility leaves, entry-point composites — in one call. A four-layer validator catches misuses. An edit-loop converges in 1–2 turns. A deterministic compiler emits a building Go binary.

Try the program viewer → Read the 12-page essay GitHub
1 call
authors a whole program
(entities + leaves + composites)
Deterministic
validator
pure code, no LLM in the gate
between revisions
Human-readable
structure
acceptance criteria → leaves/composites
traceability in one screen
Haiku
throughout
the constraint does the work,
not the model size

Three primitives. Nineteen verbs. Twelve types. Six step kinds.

The closure is the contribution. The LLM cannot invent new vocabulary — it must compose what exists. The validator enforces the closure; the codegen reads it. Everything else falls out.

Entities

Data shapes, by name

Field-name → abstract type. No methods, no inheritance. Link in one leaf's output is identifiably the same Link in another leaf's input.

Leaves

Single-responsibility atoms

Verb-first (one of 19), typed I/O, ≥ 3 examples. Cannot call other leaves. The atom codegen fills the body; examples become Go tests.

Composites

Entry points with bodies

HTTP routes, browser events, CLI commands. The body composes leaf calls using 6 step kinds: let, call, check, return, decide, map.

# URL shortener — one file, one LLM call
intent: "Backend API that shortens long URLs into short slugs and redirects."
acceptance_criteria:
  - "POST /shorten generates a slug (auto or user-supplied)"
  - "GET /{slug} redirects to original URL"

entities:
  Link: { slug: Text, original_url: Text, expires_at: Optional<Timestamp> }

leaves:
  - id: validate_url
    verb: validate
    output: { type: "Map<Text,Text>" }
    supports_criteria: [0]

composites:
  - id: create_short_link
    trigger: "POST /shorten"
    body:
      - { kind: let,    bind: v,     value: { fn: validate_url, args: { url: $input.original_url } } }
      - { kind: check,  cond: $v.ok }
      - { kind: let,    bind: slug,  value: { fn: create_slug, args: {} } }
      - { kind: return, value: { fn: compose_shorten_response, args: { slug: $slug } } }
    satisfies_criteria: [0]

Why we stopped building an 8-stage compiler

We spent six months on a staged compiler — eight LLM calls in a row, each emitting an intermediate artifact for the next. It worked on small intents and ran out of cross-stage budget on hard ones. The diagnosis took us a while.

Before — Path A (frozen)

  • 8 LLM calls in a sequence
  • Stage 1 lexer → Stage 8 codegen
  • Cross-stage retries: $5–15 / hard intent
  • Hard intents often didn't build
  • LLM never saw the whole program at once

After — Path B (this repo)

  • 1 LLM call emits the whole program
  • Validator rejects misuses
  • Edit-loop converges in 1–2 turns
  • Stages 1–6 become internal cognitive moves, never crystallised
  • The LLM holds the whole program in working memory

Honest caveat: A was our own work. The numbers reported here are architectural evidence from our development log, not a controlled comparison against an independent baseline. Read the 12-page essay for the threats discussion.

See it on a real program

The viewer renders any pithkit Program JSON. Click an acceptance criterion to highlight the leaves and composites that claim it.

URL shortener → Event attendance → Mini e-commerce →