Ch 18: Information Provenance and Batch Processing
Imagine two workers in a factory that processes intelligence reports.
The first worker is the Historian. Every time a piece of information arrives — a report, a reading, a claim — the Historian stamps it with a tag: who said it, when, and where the original document lives. If someone later asks "Why did we conclude the bridge is unsafe?", the Historian pulls the chain of evidence: "Inspector A filed report #42 on March 3rd, citing stress-test data from Lab B." No guessing. No "I think someone said something." Just facts linked to sources.
The second worker is the Assembly Line operator. When 200 inspection reports arrive at once, you do not hand them to one person and wait. You divide them into batches of 10, run each batch through a processing station, and track which batches succeeded and which failed. If batch #7 hits an error, the other 19 batches keep moving. You fix batch #7 later.
Now combine the two. The Assembly Line processes reports in bulk. The Historian stamps every conclusion with its source material. When two reports contradict each other — Inspector A says the bridge is safe, Inspector B says it is not — the Historian flags the conflict and pulls up both original reports so a human can decide.
This is source tracking and bulk processing. Source tracking answers "where did this information come from?" Bulk processing answers "how do we handle hundreds of items without one failure taking everything down?" Together, they give you reliable high-volume work with an audit trail you can trust.
Why You Need to Know Where Information Came From
Your helper gathers information, consults references, and produces conclusions. But not all sources are equal.
Suppose the helper says: "The east wing of the building has a water leak on the third floor." Where did that claim come from? Three possibilities:
- The helper read an inspector's report that documented the leak with photographs and a date. This is a supported claim — there is a source to back it up.
- The helper inferred it from a related maintenance request about plumbing work on the third floor. This is an inferred claim — plausible but not directly verified.
- The helper made it up. No inspector's report exists and no maintenance request mentions a leak. This is an unsupported claim — a fabrication dressed up as analysis.
Without source tracking, all three look identical. The helper states them with the same confidence. The person who reads the report has no way to tell which findings are real and which are fabricated.
Source tracking solves this by maintaining a chain from every claim back to its original material. Supported claims link to verified references. Inferred claims link to indirect evidence. Unsupported claims have no links at all — and get flagged automatically.
Now multiply this problem by 200 reports. A bulk review that processes an entire backlog without source tracking produces a summary that looks thorough but might be 15% fabrication. With source tracking, you know exactly which findings to trust, which to investigate, and which to discard.
You are running a bulk review of customer records. The helper processes records in batches of 10. Let us watch what happens when sources conflict.
Put the source-tracking bulk processing steps in order
Drag to reorder, or use Tab + Enter + Arrow keys.
- Divide items into batches
- Process each batch with source tracking
- Detect conflicts between sources
- Resolve using the recorded evidence
- Record audit trail for each decision
Key Insight
Source tracking is not optional. It is not a "nice to have" that you add later when you have time. It is the difference between a report you can trust and one you cannot.
Without source tracking, you cannot trace mistakes. When the helper produces a wrong answer, you have no way to follow the chain back to the source that misled it. Was the reference outdated? Did a tool return bad information? Did the helper fabricate something? You are guessing.
Without source tracking, you cannot audit. When a stakeholder asks "why did the system flag this as critical?", you need a chain of evidence — not "the helper said so." Regulations, compliance reviews, and post-incident investigations all demand traceability.
Without source tracking, you cannot trust the output. A report that says "185 out of 200 records reviewed successfully" means nothing if you cannot verify which claims are supported by real evidence and which are fabricated. Source tracking turns opaque output into transparent, verifiable analysis.
The Historian does not slow down the Assembly Line. The Historian makes the Assembly Line's output worth using.
What's Next
You now have the complete production patterns toolkit: MCP for external tool access, structured output for reliable data, validation-retry for error correction, and provenance with batch processing for trust and scale. These four chapters added the missing pieces for production-grade agent systems.
In Chapter 19, you will bring every component together — all 18 chapters' worth of infrastructure — into a single, functioning multi-agent coding platform. The capstone assembly.