Skip to main content
Technology & EngineeringAgentic Loops270 lines

self-improvement-loop

A screenshot → critique → improve-one-thing → test loop that systematically develops an

Quick Summary28 lines
A repeatable loop that turns "it works" into "it's a developed product," one page at a
time. You point it at a set of pages; each round it **screenshots every page, looks at it,
makes one small improvement, and tests it** — then commits and (optionally) deploys. Run it
again for the next layer. It converges: each round the easy wins shrink, the skip rate

## Key Points

- **One small thing per page, per round.** Not a redesign. A refresh button, an empty
- **Look before you touch.** A real screenshot of the *current* state catches what code
- **The code is the source of truth for "what's already done."** Screenshots can be
- **A hard gate, every batch.** `tsc --noEmit` = 0 and `eslint` errors = 0, no exceptions.
- **Honest skips beat forced changes.** Once a page is feature-saturated, `changed=false`
- **Parallelize the slog.** 46 pages × (look + improve + test) is hours sequentially.
1. Read the screenshot PNG (it renders visually) to SEE the current page.
2. Read the page.tsx source to inventory what ALREADY exists (don't duplicate).
3. Make exactly ONE small, safe improvement — or set changed=false with a note.
- MUST typecheck + lint clean. Self-contained.
- ONLY existing repo components (@/components/ui/*) + lucide-react. NO new deps/files.
- ONLY edit the page's OWN page.tsx. NEVER edit layout.tsx, components/ui/*, or any

## Quick Example

```bash
npx tsc --noEmit                                  # must be 0 errors
npx eslint "src/app/admin/**/*.tsx" -f json | jq  # must be 0 severity-2 (errors)
```
skilldb get agentic-loops-skills/self-improvement-loopFull skill: 270 lines
Paste into your CLAUDE.md or agent config

Self-Improvement Loop

A repeatable loop that turns "it works" into "it's a developed product," one page at a time. You point it at a set of pages; each round it screenshots every page, looks at it, makes one small improvement, and tests it — then commits and (optionally) deploys. Run it again for the next layer. It converges: each round the easy wins shrink, the skip rate rises, and you stop when honest skips dominate.

It was built and run 5 rounds against the Squiggles admin panel (49 pages) and produced ~194 improvements with tsc 0 / eslint 0 on every batch.


1. The philosophy

  • One small thing per page, per round. Not a redesign. A refresh button, an empty state, a filter chip, a validation hint. Small changes compound; big changes break.
  • Look before you touch. A real screenshot of the current state catches what code reading misses (a button floating in whitespace, a table with no empty state, a header that doesn't match the rest).
  • The code is the source of truth for "what's already done." Screenshots can be stale; the page source is not. Read it to avoid re-doing last round's work.
  • A hard gate, every batch. tsc --noEmit = 0 and eslint errors = 0, no exceptions. A change that doesn't compile didn't happen.
  • Honest skips beat forced changes. Once a page is feature-saturated, changed=false with a note is the correct output. Never force a marginal/risky change to "do something."
  • Parallelize the slog. 46 pages × (look + improve + test) is hours sequentially. Fan it out: many agents, each owning a small batch, each page a distinct file → no conflicts.

2. The loop (one round)

┌─────────────────────────────────────────────────────────────────┐
│  1. CAPTURE   screenshot every page (authenticated)              │
│  2. FAN OUT   N parallel agents, each owns a batch of pages      │
│       for each page:  look at screenshot → read code →          │
│                       make ONE improvement  (or skip honestly)   │
│  3. GATE      tsc --noEmit == 0  AND  eslint errors == 0         │
│  4. FIX       repair anything the gate caught                    │
│  5. COMMIT    one commit per round, with a per-page summary      │
│  6. (DEPLOY)  once, at the end of a round or a few rounds        │
└─────────────────────────────────────────────────────────────────┘
                          ▲                              │
                          └──────── run again ───────────┘

3. Component A — the screenshot harness

The hard part is screenshotting auth-gated pages headlessly. The trick: mint a real admin session with the Firebase Admin SDK (a custom token), then let the browser sign in with it — no auth-bypass code added to the app.

scripts/admin-screenshots.mjs (run via npm run admin:shots):

import { chromium } from "playwright";
import { initializeApp, cert, getApps } from "firebase-admin/app";
import { getAuth } from "firebase-admin/auth";
import { getFirestore } from "firebase-admin/firestore";

// 1. Admin SDK from env (.env.local): find an admin uid, mint a custom token.
const db = getFirestore(/* cert(...) from FIREBASE_ADMIN_* + project id */);
const admin = (await db.collection("users").where("isAdmin","==",true).limit(1).get()).docs[0];
const customToken = await getAuth().createCustomToken(admin.id);

// 2. Headless browser, sign in by injecting Firebase via CDN on the app's origin.
const page = await (await chromium.launch({ headless: true })).newPage();
await page.goto(BASE, { waitUntil: "domcontentloaded" });
await page.evaluate(async ({ config, token }) => {
  const { initializeApp, getApps } = await import("https://www.gstatic.com/firebasejs/12.6.0/firebase-app.js");
  const { getAuth, signInWithCustomToken, setPersistence, indexedDBLocalPersistence } =
    await import("https://www.gstatic.com/firebasejs/12.6.0/firebase-auth.js");
  // CRITICAL: use the DEFAULT app name so auth persists under
  // firebase:authUser:<apiKey>:[DEFAULT] — exactly the key the app reads on restore.
  const app = getApps().length ? getApps()[0] : initializeApp(config);
  const auth = getAuth(app);
  await setPersistence(auth, indexedDBLocalPersistence);
  await signInWithCustomToken(auth, token);
  await new Promise(r => setTimeout(r, 1800)); // let IndexedDB persist
}, { config: PUBLIC_FIREBASE_CONFIG, token: customToken });

// 3. Capture each page.
for (const route of ROUTES) {
  await page.goto(`${BASE}/admin/${route}`, { waitUntil: "domcontentloaded" });
  await page.waitForTimeout(4500); // auth restore + client data + charts
  await page.screenshot({ path: `out/admin-shots/${slug}.png`, fullPage: true });
}

Hard-won gotchas (each one cost a debugging cycle)

SymptomCauseFix
All screenshots byte-identical = the sign-in pageCDN Firebase app was given a custom name → persisted under a different IndexedDB key than the app readsInitialize the CDN app with the default name
page.goto times out on every admin pageAuthenticated pages hold Firestore real-time listeners open → networkidle never firesUse waitUntil: "domcontentloaded" + a fixed waitForTimeout
A cookie-consent banner covers every shotPersisted in localStorageAfter sign-in, click "Accept all/Essential only" once; it persists
Want to verify uncommitted changesProduction is stale (commit ≠ deploy)Screenshot local dev (no CSP, injection works); prod is faster for before shots
Vercel deploy ballooned to 700 MBLocal .next-dev/ dev-server artifact wasn't gitignoredAdd .next-dev/ to .gitignore (Vercel respects .gitignore; secrets in .env* were already excluded)
Dev servers pile up across roundsnpm run dev detaches a server that survives the wrapper exitingKill them between rounds (`netstat -ano

Prod vs local for the before shots: prod is fast and its CSP already permits Firebase auth (real users sign in there). Use local dev only when you must screenshot uncommitted changes (then there's no CSP and the injection works freely).

Crash detection is free here. Attach page.on("pageerror", …) and page.on("console", …) during the capture loop and record errors per page. A page that throws renders the framework's full-page error boundary — easy to miss in a thumbnail, invisible to an API-only smoke test. A real run caught a production tab that crashed on undefined.contentMix this way; the screenshot pass flagged it automatically.


4. Component B — the fan-out improvement pass

Orchestrate N parallel agents (one Workflow run). Each agent owns a small batch (~5 pages), processes them sequentially within the agent, edits each page's own page.tsx, and returns a structured result. Distinct files per agent → no write conflicts → no worktree isolation needed.

const BATCH = 5;                         // pages per agent
const batches = chunk(routes, BATCH);    // ~10 agents for 46 pages
const SCHEMA = { pages: [{ route, file, changed: boolean, improvement, note? }] };

await parallel(batches.map((batch, i) => () =>
  agent(`${ROUND_GUIDE}\n\nYOUR PAGES:\n${batch.map(b =>
    `- route "${b.route}"\n    screenshot: ${b.shot}\n    source: ${b.page}`).join("\n")}`,
    { label: `improve:batch${i+1}`, schema: SCHEMA })
));

The per-agent guide (the contract)

Every round's prompt enforces the same hard rules and varies only the target of the improvement:

For EACH assigned page:
  1. Read the screenshot PNG (it renders visually) to SEE the current page.
  2. Read the page.tsx source to inventory what ALREADY exists (don't duplicate).
  3. Make exactly ONE small, safe improvement — or set changed=false with a note.

HARD RULES:
  - MUST typecheck + lint clean. Self-contained.
  - ONLY existing repo components (@/components/ui/*) + lucide-react. NO new deps/files.
  - ONLY edit the page's OWN page.tsx. NEVER edit layout.tsx, components/ui/*, or any
    shared/imported file.
  - Preserve "use client", imports, behavior, and all data/auth/business logic.
  - Derive anything (stats, filters) from data ALREADY in component state — no new
    network calls, no API changes.
  - If nothing safe and non-duplicative remains, set changed=false. Honest skips are
    correct and expected as rounds increase.

5. The improvement ladder (what to target each round)

The loop gets deeper each round. Start with structure/affordances; end with validation and edge cases. This is the exact ladder that worked:

RoundThemeExamples
1Structure & affordancesempty states, aria-labels on icon buttons, gradient icon-chip headers, a search box, Export-CSV
2Interaction & data controlfilter chips + count badges, sortable columns / sort dropdowns, "load more" pagination, refresh buttons, tab metric summaries, keyboard shortcuts, unsaved-changes guard
3Persistence & conveniencepersist tab/filter/sort to localStorage, copy-to-clipboard with check state, relative "x ago" timestamps, confirm dialogs on destructive actions, "showing X of Y" counts
4Robustnesserror states with a Retry button (where a failed fetch silently showed a misleading empty state), sticky table headers, truncation tooltips, "last updated" indicator, totals footer rows
5Forms & validationdisable submit until required fields are valid + inline hints, email/date validation, loading spinners on submits, toLocaleString number formatting, hover affordances
6+(diminishing)mostly honest skips — pivot to a focused deep-build on one page instead

6. The gate (non-negotiable)

After every round's edits, before committing:

npx tsc --noEmit                                  # must be 0 errors
npx eslint "src/app/admin/**/*.tsx" -f json | jq  # must be 0 severity-2 (errors)

If the parallel edits introduced an error, fix it before committing. The gate is what makes fanning out to autonomous agents safe.


7. Commit & deploy cadence

  • One commit per round, with a body that lists the per-page improvements (so each round is reviewable and revertible).
  • Deploy once at the end of a round (or a few rounds) — not per page. Stacking many rounds before deploying creates a "committed but invisible" gap; deploy before that grows too large.
  • Verify the deploy: hit a gated route (expect 401), a public route (expect 200), and screenshot one improved page to confirm it shipped.

8. When to stop (convergence)

The skip rate is the signal. From this run:

RoundImprovedSkipped
1442
2442
3442
4406
52224

When skips cross ~50%, the broad "one small thing per page" sweep is tapped out. Stop the broad loop and pivot to a focused deep-build on a single high-value page (e.g. "turn the dashboard into a real analytics cockpit with date ranges, period-over-period deltas, and drill-downs"). That's where the remaining ROI lives.


9. How to re-run it

# 1. (Re)capture screenshots of every admin page
npm run admin:shots                 # → out/admin-shots/*.png   (prod by default)
SHOT_BASE=http://localhost:3000 npm run admin:shots   # local, for uncommitted changes

# 2. Run the fan-out improvement workflow (orchestrator), reusing the round guide above,
#    bumping the "ladder" target for the new round.

# 3. Gate, fix, commit, deploy:
npx tsc --noEmit && npx eslint "src/app/admin/**/*.tsx"
git add src/app/admin && git commit -m "feat(admin): improvement loop round N"
vercel --prod

10. Why it works

  • Screenshots make the agents see — they catch the visual gaps (empty white space, mismatched headers, missing states) that pure code review skips.
  • Reading the code makes them not repeat — the screenshot can lag a round behind; the source is current, so "what's already there" is always accurate.
  • The gate makes parallel autonomy safe — 10 agents editing 44 files is fine when every batch must pass tsc + lint before it's committed.
  • Honest skips make it converge — instead of degrading into marginal busywork, it declares pages done and tells you when to switch strategies.

Generated from the Squiggles admin self-improvement loop (5 rounds, ~194 improvements, 46 pages). Reusable for any auth-gated, multi-page UI.

Install this skill directly: skilldb add agentic-loops-skills

Get CLI access →