Open SourceIn Progress

Chronograph — Human History According to AI

A machine-generated chronicle of 5,226 years of human civilization — complete, and now readable

The complete 5,226-year JSON corpus of recorded human history — every claim sourced, every gap declared — read through Chronograph: Notebook, Stratum, and Atlas views, with an EN→IS translation pipeline.

View Project GitHub

“History is written by the victors — but a structured knowledge corpus must account for everyone else.
— Project design note, LEDGER.md

RoleAuthor & Architect

DurationApril 2026–Ongoing

UpdatedJuly 7, 2026

StatusIn Progress

An ornate brass astrolabe with a glowing center in the dark — an emblem of recorded history, a smrhsstyle painting on a pitch-black void.

The Challenge

Structured, machine-readable historical knowledge at scale does not exist. Encyclopedic resources are written for human readers and resist programmatic querying. No standardized format captures the confidence levels, causal relationships, and documented gaps that rigorous historical reasoning requires. Building a 5,226-year corpus manually would take decades; doing it carelessly with AI would produce confident-sounding noise.

The Approach

Designed the ICCRA schema — a JSON format requiring source citations, confidence levels (confirmed, probable, approximate, traditional, legendary), causal relationships, and explicit geographic gap declarations for every year. Built a Python async daemon that calls Claude Sonnet 4.6 via the Anthropic API, validates output against the schema, tracks progress in an append-only ledger, and recovers from failures without data loss. Achieved a 99% cost reduction through direct API integration with batch processing, making a 5,226-year generation run economically viable.

Outcomes

5,226 yearsScope3200 BCE to 2025 CE — the full span of the documented historical record

57.7 hoursCorpus completeThe daemon ran 2026-04-10 to 04-13 and finished all 5,226 years without a single failed year

3 viewsFrontendChronograph v0.2 — Notebook editorial folio, Stratum per-year instrument view, Atlas orthographic globe

EN → ISTranslation layerCI-integrated Icelandic localization: locked system prompt, six-guard correctness chain, idempotent SHA256 manifest

What Is This?

Human History According to AI is an autonomous research daemon that generates a structured, machine-readable chronicle of human civilization — year by year, from 2025 CE back to approximately 3200 BCE. Each of the 5,226 years receives its own JSON file, populated by Claude Sonnet 4.6 through the Anthropic API.

This is not a narrative history. It is a knowledge corpus designed for graph databases, timelines, adversarial review, and further AI reasoning. Every event cites its sources, declares its confidence level, and surfaces disconfirming evidence where it exists.

Why Build This?

Historical knowledge is abundant but unstructured. Existing resources — encyclopedias, academic papers, Wikipedia — are written for human readers: narrative, discursive, and difficult to query at scale. This project asks a different question: what does a structured, machine-readable substrate of human history look like?

The answer is the ICCRA schema — a JSON format that captures events, causal relationships, geographic coverage, confidence levels, and explicit declarations of what we don't know. The geographic gaps field is not an afterthought: it is a deliberate acknowledgment that the documentary record is not evenly distributed across the world's populations.

Architecture

The system is a Python async daemon that orchestrates API calls, tracks progress in an append-only ledger file, validates output against the ICCRA schema, and recovers gracefully from failures. A Next.js 16 frontend provides an interactive timeline visualization of the generated data.

The project achieved a 99% cost reduction by migrating from a third-party orchestration layer to direct Anthropic API calls with batch processing — a change that made sustained, long-running generation economically viable.

Confidence Levels

Every historical claim in the corpus is tagged with one of five confidence levels:

Confirmed: Primary sources, physical evidence, multiple independent attestations.
Probable: Strong circumstantial or secondary evidence.
Approximate: General scholarly consensus, imprecise dating.
Traditional: Preserved in cultural memory but not independently verified.
Legendary: Mythological or folkloric — included for completeness, clearly flagged.

This tiered system means the corpus can be queried by epistemic quality, not just by date or region.

Geographic Gaps

One of the schema's most important fields is the explicit declaration of geographic coverage gaps. For any given year, the daemon is required to state which regions and populations are under-documented — not because the history didn't happen, but because the surviving record doesn't capture it. This is an acknowledgment built into the structure of the data rather than hidden in caveats.

Chronograph — the reading room

The corpus is complete: the daemon ran for 57.7 hours across 2026-04-10 to 04-13 and finished the entire 5,226-year span without a single failed year. On top of it sits Chronograph, the editorial frontend, in three views: the Notebook folio for reading history year by year, the Stratum instrument view — a per-year dashboard of events, sources, and confidence — and the Atlas, an orthographic globe for the spatial record.

A second layer of scholarly evidence is being added era by era through deep-dives via the Scite research API — seven scholarly eras validated so far. And the whole corpus is being localized into Icelandic through a CI-integrated translation pipeline: a locked Icelandic system prompt, a six-guard correctness chain, and an idempotent SHA256 manifest so the GitHub Action only translates what changed.

The project is open-source. Contributions, schema critiques, and adversarial review of generated content are welcome.

Technology Stack

PythonAnthropic APINext.jsDockerGitHub Actions

Resources

Chronograph — live siteother

GitHub Repositoryrepository

Lessons Learned

AI-generated historical content requires explicit confidence signaling built into the data schema. Without declared uncertainty levels, outputs read as authoritative when they should not be.
Geographic coverage gaps are a feature, not a bug. Making them a required schema field forces honest accounting of what the documentary record does and does not capture.
An append-only ledger is the right pattern for long-running AI generation jobs — it makes the process resumable, auditable, and cost-predictable without complex state management.
99% cost reduction through direct API access vs. third-party orchestration layers changes what is economically viable in AI research projects. Infrastructure choices have research-scope consequences.
Reverse chronological processing is the right order — start with the years where you can verify quality before committing to the ancient record where verification is harder.

Related Projects

View all projects →

A brass column with a vertical copper seam of light glowing in darkness — a smrhsstyle painting, copper on pitch-black void.

Project

AI Governance

Scite — Webinar

Policy · Infrastructure · the irreducible human — held together. Pull one and the whole stops holding.

2 months ago

A cracked black sphere haloed by copper light like an eclipse — an emblem of fragmentation, a smrhsstyle painting on a pitch-black void.

Project

Research

Beyond Fragmentation

Why AI governance fails by design — and why all harmful effects of technology are rooted in human governance.

6 months ago