Skip to main content
Technology & EngineeringAgentic Loops423 lines

research-synthesis-loop

A gather → synthesize → critique-gaps → fill loop that builds a comprehensive, fully-cited

Quick Summary30 lines
A loop that turns "an agent ran three searches and wrote a confident-sounding answer" into "a
synthesis where every claim resolves to read text and a completeness critic can name nothing
material that's missing." Each round it **fans out multi-modal searchers that each attack a
different angle, synthesizes the read text into cited claims, then runs a completeness critic

## Key Points

- **The completeness critic is the ENGINE, not a final rubber-stamp.** A critic run once at the
- **Multi-modal sweep beats a deeper single search.** Five searchers with the *same* query
- **Every claim is gated to read text — a synthesis is only as trustworthy as its weakest
- **Actively gather COUNTER-evidence.** A loop that only collects sources agreeing with the
- **A synthesis RESOLVES contradictions; it does not concatenate.** "Source A says X, source B
- **Dedup sources by URL/DOI.** The same study reaches you as a press release, a blog summary,
- **ONE focused expansion per round — the critic's TOP gap.** Chasing all six gaps at once
- **A hard budget, and the critic must triage material vs cosmetic.** "One more source" is
1. **Claim grounding.** Every `sourceId` on every claim resolves to a source **in the store
2. **Critic sign-off.** The completeness critic returns `signoff: true` (zero **material**
- **`cleanSignoffs = critic.signoff ? +1 : 0`** — one clean round can be the critic having a
- **`target = material[0].query`** — the single top material gap, not all of them. Progress is

## Quick Example

```js
// THE GATE, mechanically: no claim survives that cites text we don't physically hold.
const grounded = synth.claims.every(c =>
  c.sourceIds.length > 0 && c.sourceIds.every(id => liveIds.has(id)));
const noHallucinatedCites = grounded;                    // hard fail if false
```
skilldb get agentic-loops-skills/research-synthesis-loopFull skill: 423 lines
Paste into your CLAUDE.md or agent config

Research-Synthesis Loop

A loop that turns "an agent ran three searches and wrote a confident-sounding answer" into "a synthesis where every claim resolves to read text and a completeness critic can name nothing material that's missing." Each round it fans out multi-modal searchers that each attack a different angle, synthesizes the read text into cited claims, then runs a completeness critic that GENERATES the next round's work by naming the gaps — and repeats until the critic signs off twice or the budget is spent.

The whole design fights two failure modes that make agentic research worthless — a single searcher's framing only surfaces what it already thought to ask (the unknown-unknowns problem), and a loop that only collects supporting sources feels done while being wrong (confirmation bias). The multi-angle sweep beats the first; a critic that demands a steel-manned opposing view and grounds every claim to read text beats the second.

It was run to build a 4,000-word brief on "is the four-day work week net-positive for output?": 9 → 31 → 47 → 54 sources across four rounds, gap-count 6 → 4 → 1 → 0, every one of 71 final claims resolving to a fetched, read source.


1. The philosophy

  • The completeness critic is the ENGINE, not a final rubber-stamp. A critic run once at the end just blesses whatever exists. Run it every round and make its output the next round's queries. "What modality didn't you search? Which claim is unverified? Whose counter-argument is missing?" — those questions are the only thing that surfaces the unknown unknowns a single searcher's framing never reaches. The gap list is the work-list.
  • Multi-modal sweep beats a deeper single search. Five searchers with the same query return five overlapping result sets — you found one thing five times. Give each a different angle — by-entity, by-time-window, by-source-type, by-counter-argument — and they hit disjoint corners of the space. Diversity of framing, not depth of one frame, is what fills the picture.
  • Every claim is gated to read text — a synthesis is only as trustworthy as its weakest citation. A claim citing a source the agent never fetched is a hallucinated citation, and one is enough to make the whole document untrustworthy. The gate verifies each citation resolves to gathered text. Unsourced sentence ⇒ not a claim, it's an opinion; cut it or go find the source.
  • Actively gather COUNTER-evidence. A loop that only collects sources agreeing with the forming thesis converges fast and confidently to a wrong answer. One searcher's entire job each round is to find the strongest opposing case, and the critic must reject a synthesis that hasn't steel-manned the other side.
  • A synthesis RESOLVES contradictions; it does not concatenate. "Source A says X, source B says not-X" appended back to back is a list, not a synthesis. The synthesizer must adjudicate — reconcile, weight by evidence quality, or explicitly flag the dispute as unresolved with why. An append is the critic's cue to reject.
  • Dedup sources by URL/DOI. The same study reaches you as a press release, a blog summary, and the preprint. Counting it three times manufactures false consensus. Key the source store on a normalized URL/DOI; the same canonical source counts once.
  • ONE focused expansion per round — the critic's TOP gap. Chasing all six gaps at once makes the round's progress unattributable and the budget unpredictable. Take the single highest-priority material gap, fan out searchers against that gap, and let the round-over-round gap-count be your honest convergence signal.
  • A hard budget, and the critic must triage material vs cosmetic. "One more source" is infinite. The loop stops on a token/source budget OR on the critic signing off twice — and the critic only counts a gap as material if filling it would change a claim, not just add one more confirming citation.

2. The loop (one round)

┌──────────────────────────────────────────────────────────────────────┐
│  ROUND INPUT: targeted queries (round 1 = the question, decomposed;  │
│               round N = the critic's TOP gap from round N-1)          │
│                                                                       │
│  1. FAN-OUT GATHER   parallel() searchers, ONE angle each, blind:    │
│       by-entity · by-time-window · by-source-type · by-counter-arg    │
│       → fetch + READ each hit; store text keyed by canonical URL/DOI  │
│  2. DEDUP            collapse by normalized URL/DOI → unique sources   │
│  3. SYNTHESIZE       resolve contradictions into claims, each tagged   │
│                      {claim, sourceIds[]} — NO append-lists           │
│  4. GROUND-GATE      every claim's sourceIds MUST resolve to STORED    │
│                      read text → drop/flag hallucinated citations      │
│  5. COMPLETENESS     critic agent → structured gaps {kind, query,     │
│     CRITIC           priority, material:bool}; signs off iff none      │
│  6. RECORD           append round row; pick TOP material gap as next   │
│                      round's input; tick the budget                   │
└─────────────────────────────────────────────┬────────────────────────┘
        critic: 0 material gaps ─ yes ─► cleanSignoffs++  else =0
                       │                              │
       cleanSignoffs >= 2  OR  budget hit ── no ──────┘
                       │ stop                feed TOP gap in as next round

The critic closes the loop and opens the next one — its gap list is literally the round-N+1 query set. Round time is the slowest searcher, not the sum, because the gather fans out.


3. Component A — the multi-angle gather fan-out

Each searcher is a separate agent with a different angle prompt, blind to the others' raw results (it gets the question + the round's target gap, not the other searchers' hits). Run them with parallel() — they only read and write to a shared store keyed by canonical URL.

const ANGLES = {
  byEntity:      `Enumerate the KEY ENTITIES in the question — people, orgs, studies, products,
                  places — and search each by NAME. Find the primary source for each entity,
                  not commentary about it. Return the canonical/original where one exists.`,
  byTimeWindow:  `Search the TIMELINE: the seminal/origin source, the current state-of-the-art,
                  and anything from the last 18 months that supersedes older claims. An answer
                  built on 2019 data when 2025 data reverses it is wrong, not just stale.`,
  bySourceType:  `Deliberately vary SOURCE TYPE: peer-reviewed > official/primary > reputable
                  press > practitioner write-up. Do NOT return four blog posts. A claim backed
                  only by secondary coverage is weaker than one backed by the primary source.`,
  byCounterArg:  `Find the STRONGEST case AGAINST the forming thesis. Search for "criticism of",
                  "limitations of", "failed to replicate", dissenting experts, contrary data.
                  Your job is the steel-man, not the strawman. Return the best opposing source.`,
};

const SEARCH_SCHEMA = {
  hits: [{ url:'string', title:'string', sourceType:'peer|primary|press|practitioner',
           snippet:'string', whyRelevant:'string' }],
};

// Each searcher gets the question + this round's target gap; only the angle differs.
const hits = (await parallel(Object.entries(ANGLES).map(([angle, brief]) => () =>
  agent(
    `You are the ${angle.toUpperCase()} searcher. Attack ONLY through this angle:\n${brief}\n\n` +
    `QUESTION:\n${QUESTION}\n\nTHIS ROUND'S TARGET GAP:\n${targetGap}\n\n` +
    `Run searches, then FETCH and READ each promising hit. Return {url,title,sourceType,` +
    `snippet,whyRelevant} ONLY for sources whose text you actually retrieved. ` +
    `If your angle finds nothing new, return hits: []. Empty is honest; padding is not.`,
    { label: `gather:${angle}`, schema: SEARCH_SCHEMA, tools: [webSearch, fetchUrl] },
  )
))).flatMap(r => r.hits);

Why one angle per searcher, not one mega-query: a single "research this question" prompt anchors on the most obvious framing and returns a self-similar result set — it never thinks to look for the dissent or the primary source unless told to. Splitting the search budget across fixed angles forces coverage of corners the model would skip; the counter-arg searcher has to find the opposing case because that is its only job.


4. Component B — dedup + the source store (the grounding substrate)

Every fetched source is stored with its read text, keyed by a canonical URL/DOI so the same study arriving via three routes collapses to one entry. The store is what the gate checks claims against — a claim can only cite a sourceId that lives here with text.

// The store IS the ground truth. A claim may only cite a sourceId present here with body text.
const store = new Map();   // canonicalKey -> { id, url, title, sourceType, text }

function canonicalKey(url) {
  const doi = /\b10\.\d{4,9}\/[^\s"&?]+/i.exec(url)?.[0];     // a DOI is the strongest key
  if (doi) return `doi:${doi.toLowerCase()}`;
  const u = new URL(url);
  u.hash = '';                                                // strip fragment
  ['utm_source','utm_medium','utm_campaign','ref','fbclid'].forEach(p => u.searchParams.delete(p));
  return `url:${u.host.replace(/^www\./, '')}${u.pathname.replace(/\/$/, '')}`;
}

async function ingest(hit) {
  const key = canonicalKey(hit.url);
  if (store.has(key)) return store.get(key).id;               // same canonical source → count once
  const text = await fetchUrl(hit.url);                       // FETCH + read; no text → not a source
  if (!text || text.length < 200) return null;                // a 404/paywall/JS-shell is NOT a source
  const id = `S${store.size + 1}`;
  store.set(key, { id, url: hit.url, title: hit.title, sourceType: hit.sourceType, text });
  return id;
}

const newSourceIds = (await Promise.all(hits.map(ingest))).filter(Boolean);

The if (store.has(key)) … count once line is what kills manufactured consensus: the preprint, the press release, and the blog summary of the same study share a DOI (or collapse to the same canonical URL) and become one source, not three votes. The text.length < 200 check is the other half — a hit you couldn't actually read (paywall, 404, JS shell) is not a source and must never back a claim.


5. Component C — synthesize, then the completeness critic (the heart)

The synthesizer reads the store text and emits claims, each tagged with the sourceIds it rests on, resolving contradictions rather than appending them. Then the completeness critic — the engine — names what's missing and returns a structured, actionable gap list.

const SYNTH_SCHEMA = {
  claims: [{ claim:'string', sourceIds:['string'], confidence:'high|med|low' }],
  contradictions: [{ topic:'string', resolution:'string', sourceIds:['string'] }],
};

const synth = await agent(
  `Synthesize an answer to:\n${QUESTION}\n\nUsing ONLY these sources (id → text):\n${storeDigest(store)}\n\n` +
  `Emit {claim, sourceIds, confidence} for each claim. EVERY claim MUST cite sourceIds that ` +
  `appear above — never invent an id. Where sources DISAGREE, do NOT append both: RESOLVE it ` +
  `(weight by source quality / recency) or record it under contradictions with WHY it's unresolved. ` +
  `A concatenated list of quotes is a failure, not a synthesis.`,
  { label: 'synthesize', schema: SYNTH_SCHEMA },
);

// ── GROUND-GATE: every cited id must resolve to stored read text (see §6) ──
const liveIds = new Set([...store.values()].map(s => s.id));
for (const c of synth.claims) {
  const bad = c.sourceIds.filter(id => !liveIds.has(id));
  if (bad.length) throw new Error(`HALLUCINATED CITATION in "${c.claim.slice(0,60)}…": ${bad}`);
}

const CRITIC_SCHEMA = {
  gaps: [{ kind:'modality|unverified-claim|unread-source|missing-counterarg|unresolved-contradiction',
           description:'string', query:'string', priority:'number', material:'boolean' }],
  signoff: 'boolean',   // true ⇔ zero MATERIAL gaps remain
};

const critic = await agent(
  `You are the COMPLETENESS CRITIC. Here is the question, the current synthesis, and the list ` +
  `of sources actually read:\n${QUESTION}\n\nSYNTHESIS:\n${JSON.stringify(synth)}\n\n` +
  `SOURCES READ:\n${sourceList(store)}\n\nFind what is MISSING. For each gap, ask:\n` +
  `  • MODALITY: a search angle never run (an entity, a time window, a source type)?\n` +
  `  • UNVERIFIED-CLAIM: a claim resting on one weak/secondary source, or none?\n` +
  `  • UNREAD-SOURCE: a key primary source referenced but never fetched?\n` +
  `  • MISSING-COUNTERARG: is the opposing case steel-manned, or absent/strawmanned?\n` +
  `  • UNRESOLVED-CONTRADICTION: a disagreement appended instead of adjudicated?\n` +
  `For each, give a concrete next QUERY that would close it, a priority, and material:true ONLY ` +
  `if closing it would CHANGE a claim (not merely add one more confirming citation). ` +
  `signoff=true IFF zero MATERIAL gaps remain. Cosmetic gaps do NOT block signoff.`,
  { label: 'critic', schema: CRITIC_SCHEMA },
);

Why the critic must distinguish material from cosmetic, and steel-man the opposition: the loop's two ways to fail forever are (a) chasing cosmetic "add one more source" gaps to infinity, and (b) declaring victory on a one-sided case. Forcing material:true to mean "would change a claim" gives the budget a real exit; forcing a missing-counterarg check makes the critic demand the dissent before it signs off. A synthesis the critic can't break is one where the unknown-unknowns have been hunted, not just the known ones answered.


6. The gate (non-negotiable)

A synthesis ships only if BOTH hold, every round:

  1. Claim grounding. Every sourceId on every claim resolves to a source in the store with read text (§4). A citation pointing at an id that was never fetched is a hallucinated citation — the loop throws, not warns. Verify it mechanically; do not trust the model's word that it read something.
// THE GATE, mechanically: no claim survives that cites text we don't physically hold.
const grounded = synth.claims.every(c =>
  c.sourceIds.length > 0 && c.sourceIds.every(id => liveIds.has(id)));
const noHallucinatedCites = grounded;                    // hard fail if false
  1. Critic sign-off. The completeness critic returns signoff: true (zero material gaps). A synthesis the critic can still break didn't converge — it's a draft, not a result.

The rule: a claim that doesn't resolve to read text didn't happen; a synthesis the critic can still break didn't converge. The grounding gate is what makes an autonomous research run trustworthy enough to ship unread — every sentence is anchored to text you can re-open.

The canonical research bug is the claim that reads perfectly and cites a source that says something else (or nothing — a fabricated DOI). tsc has no opinion on prose; only mechanically checking citation → stored text catches it. A claim that doesn't pass the grounding gate is not in the document.


7. The gap loop (re-run until the critic is satisfied)

Research isn't "search, then write." It's: take the critic's top material gap, fan out searchers against that gap, re-synthesize, re-critique — repeat until the critic signs off twice or the budget is spent. One focused expansion per round keeps progress attributable.

let target = decompose(QUESTION);          // round 1 = the question, broken into angle seeds
let round = 0, cleanSignoffs = 0, budget = 250_000, spent = 0, log = [];

while (cleanSignoffs < 2 && spent < budget) {
  round++;
  const hits   = await gather(target);                 // §3 parallel multi-angle
  const ids    = await ingestAll(hits);                // §4 dedup + store
  const synth  = await synthesize(store);              // §5 resolve, not append
  assertGrounded(synth, store);                        // §6 gate (1) — throws on hallucinated cite
  const critic = await critique(QUESTION, synth, store);// §5 the engine
  spent = tokenMeter.total();

  const material = critic.gaps.filter(g => g.material).sort((a,b) => b.priority - a.priority);
  log.push({ round, sources: store.size, newSources: ids.length,
             claims: synth.claims.length, openGaps: material.length, signoff: critic.signoff });

  cleanSignoffs = critic.signoff ? cleanSignoffs + 1 : 0;   // §6 gate (2); reset on any material gap
  if (critic.signoff) continue;                             // re-confirm the dry round before stopping

  if (material.length === 0) continue;                      // signoff false but no material gap → re-critique
  target = material[0].query;                               // ONE expansion: the TOP material gap
}

Two lines carry the convergence design:

  • cleanSignoffs = critic.signoff ? +1 : 0 — one clean round can be the critic having a generous moment. Require two consecutive sign-offs (reset on any material gap) before you stop, exactly as a bug hunt requires two dry rounds.
  • target = material[0].query — the single top material gap, not all of them. Progress is attributable to one expansion per round, and the gap-count trend (§8) is an honest signal instead of noise from six simultaneous changes.

8. When to stop (convergence)

Two signals must both turn: open material gaps → 0 (the critic ran out of things to ask) and new sources per round → small (the angles ran dry). From the four-day-week run:

RoundSourcesNewClaimsOpen material gapsSignoff
199186no — no counter-arg, 2019 data, 3 claims single-sourced
23122444no — counter-arg added, but a contradiction left appended
34716631no — one claim still rested on a press summary, not the study
4547710yes (×1)
5562710yes (×2 → stop)

Read it: Sources keeps rising but New decays as dedup saturates and angles dry up; Open gaps is the truth and trends to 0; round 5 adds two sources, changes no claim, and the critic signs off a second time — stop. Had you stopped at round 4's first sign-off you'd have been right by luck, but the second round is cheap insurance against the critic's generous moment.

If open gaps stall above 0 and new sources keep climbing, you're not converging — the question is too broad (decompose it into sub-questions, each its own loop) or a gap is genuinely unanswerable from available sources (record it as an explicit open question in the document with why, and exclude it from the gap count — an honest "unknown" is convergence; a silently dropped gap is not).


9. How to re-run it

# 1. State the question precisely. Vague question → vague gaps. Decompose into angle seeds.
#    (round-1 target = the question; rounds 2+ target = the critic's top MATERIAL gap.)

# 2. Run the loop: parallel multi-angle gather (§3) → dedup+store (§4) → synthesize (§5)
#    → GROUND-GATE (§6) → completeness critic (§5). Persist the source store + gap log to disk.
node research-loop.mjs            # logs: round / sources / new / claims / open-gaps / signoff

# 3. The gate runs EVERY round, mechanically — a hallucinated citation throws, not warns:
node verify-citations.mjs synth.json store.json   # exit 1 if any sourceId ∉ store

# 4. Stop when signoff is true TWICE in a row, or the token budget is hit. Emit the rounds
#    table + the list of explicit open questions (gaps that were unanswerable — no silent drops).

To resume a run later, reload the source store from disk — the loop picks up with all prior read text intact, dedups new hits against it, and the critic re-evaluates against the full corpus instead of re-fetching everything. The store is the checkpoint.


10. Gotchas (each one cost a real debugging cycle)

SymptomCauseFix
A claim cites source S7 that was never fetchedModel wrote a plausible-looking citation; nobody checked it resolves to stored textThe grounding gate — verify every sourceId ∈ store with text, throw on miss — §6
Loop runs forever, "one more source"No budget, and the critic counts cosmetic gaps (one more confirming cite) as blockingHard token budget and material:true ⇔ "would change a claim" — §5/§7
"Synthesis" is a list of quotes, contradictions appended back-to-backSynthesizer concatenated instead of adjudicatingRequire it to RESOLVE (weight/reconcile) or flag-with-why; an append is a critic rejection — §5
Confident answer that's one-sided / wrongEcho-chamber gather — only supporting sources collected, critic never demanded dissentA dedicated counter-arg searcher and the critic's missing-counterarg check (steel-man) — §3/§5
Same study counted three times → false consensusDeduped by raw URL; press release + preprint + blog have different URLsCanonical key by DOI then normalized URL; same canonical source counts once — §4
Critic rubber-stamps the first draftCritic run once at the end as a final check, not as the loop's engineRun it every round and feed its gap list back in as the next queries — §1/§7
Stops after one clean round, misses a gapOne sign-off can be a generous momentRequire two consecutive sign-offs (reset on any material gap) — §8
Five searchers return the same result setIdentical queries — N searches found one thing N timesOne angle per searcher (entity/time/source-type/counter-arg), blind to each other — §3
A claim rests on a paywalled/404 page the agent "read"Hit stored without verifying body text actually came backReject hits with text.length < 200; no readable text ⇒ not a source — §4

11. Why it works

  • The completeness critic as engine makes it hunt unknown-unknowns — by generating each round's queries from "what's missing," it surfaces the modality, the counter-argument, the unread primary source that the original framing never thought to ask for.
  • Multi-angle gather makes searches cover the space — entity, time, source-type and counter-arg searchers hit disjoint corners, so N searchers find N things instead of one thing N times.
  • The grounding gate makes the synthesis trustworthy — every claim mechanically resolves to text physically in the store, so a hallucinated citation throws and a paywalled non-source can't back an assertion.
  • Counter-evidence gathering + contradiction resolution make it correct, not just confident — the steel-manned opposing case and adjudicated disputes kill the echo-chamber that makes a one-sided loop feel done while being wrong.
  • Two-signoff convergence + a hard budget + explicit open questions make it honest — it stops at "the critic can name nothing material," it can't spin on cosmetic gaps, and it tells you exactly what stayed unknown instead of papering over it.

Generated from a 4-round, 54-source brief ("is the four-day work week net-positive for output?"): multi-angle gather, claim→source grounding gate, a completeness critic that drove gaps 6 → 0, stop on two consecutive sign-offs. Reusable for any deep-research or comprehensive-document task where one pass always misses something.

Install this skill directly: skilldb add agentic-loops-skills

Get CLI access →