Skip to main content
Back to projects
Open SourceIn Progress

Human History According to AI

A machine-generated chronicle of 5,226 years of human civilization

A Python daemon and Claude API pipeline generating a structured JSON corpus of all 5,226 years of recorded human history — with sources, confidence levels, and documented gaps.

History is written by the victors — but a structured knowledge corpus must account for everyone else.
Project design note, LEDGER.md
RoleAuthor & Architect
DurationApril 2026–Ongoing
StatusIn Progress

The Challenge

Structured, machine-readable historical knowledge at scale does not exist. Encyclopedic resources are written for human readers and resist programmatic querying. No standardized format captures the confidence levels, causal relationships, and documented gaps that rigorous historical reasoning requires. Building a 5,226-year corpus manually would take decades; doing it carelessly with AI would produce confident-sounding noise.

The Approach

Designed the ICCRA schema — a JSON format requiring source citations, confidence levels (confirmed, probable, approximate, traditional, legendary), causal relationships, and explicit geographic gap declarations for every year. Built a Python async daemon that calls Claude Sonnet 4.6 via the Anthropic API, validates output against the schema, tracks progress in an append-only ledger, and recovers from failures without data loss. Achieved a 99% cost reduction through direct API integration with batch processing, making a 5,226-year generation run economically viable.

Outcomes

5,226 yearsScope3200 BCE to 2025 CE — the full span of the documented historical record
1,160 yearsCompleted22% of the corpus generated as of April 2026, processed in reverse chronological order
99%Cost reductionAchieved by migrating from third-party orchestration to direct Anthropic API calls with batch processing
ICCRA schemaOutput formatStructured JSON per year: events, sources, confidence levels, causal links, geographic gaps

What Is This?

Human History According to AI is an autonomous research daemon that generates a structured, machine-readable chronicle of human civilization — year by year, from 2025 CE back to approximately 3200 BCE. Each of the 5,226 years receives its own JSON file, populated by Claude Sonnet 4.6 through the Anthropic API.

This is not a narrative history. It is a knowledge corpus designed for graph databases, timelines, adversarial review, and further AI reasoning. Every event cites its sources, declares its confidence level, and surfaces disconfirming evidence where it exists.

Why Build This?

Historical knowledge is abundant but unstructured. Existing resources — encyclopedias, academic papers, Wikipedia — are written for human readers: narrative, discursive, and difficult to query at scale. This project asks a different question: what does a structured, machine-readable substrate of human history look like?

The answer is the ICCRA schema — a JSON format that captures events, causal relationships, geographic coverage, confidence levels, and explicit declarations of what we don't know. The geographic gaps field is not an afterthought: it is a deliberate acknowledgment that the documentary record is not evenly distributed across the world's populations.

Architecture

The system is a Python async daemon that orchestrates API calls, tracks progress in an append-only ledger file, validates output against the ICCRA schema, and recovers gracefully from failures. A Next.js 16 frontend provides an interactive timeline visualization of the generated data.

The project achieved a 99% cost reduction by migrating from a third-party orchestration layer to direct Anthropic API calls with batch processing — a change that made sustained, long-running generation economically viable.

Confidence Levels

Every historical claim in the corpus is tagged with one of five confidence levels:

  • Confirmed: Primary sources, physical evidence, multiple independent attestations.
  • Probable: Strong circumstantial or secondary evidence.
  • Approximate: General scholarly consensus, imprecise dating.
  • Traditional: Preserved in cultural memory but not independently verified.
  • Legendary: Mythological or folkloric — included for completeness, clearly flagged.

This tiered system means the corpus can be queried by epistemic quality, not just by date or region.

Geographic Gaps

One of the schema's most important fields is the explicit declaration of geographic coverage gaps. For any given year, the daemon is required to state which regions and populations are under-documented — not because the history didn't happen, but because the surviving record doesn't capture it. This is an acknowledgment built into the structure of the data rather than hidden in caveats.

Progress

As of April 2026, the daemon has completed 1,160 of 5,226 years — roughly 22% of the full corpus. Years are processed in reverse chronological order, so recent history (where sources are most abundant and verifiable) came first. The generated data is available on GitHub and structured for direct ingestion into graph databases or timeline tools.


The project is open-source. Contributions, schema critiques, and adversarial review of generated content are welcome.

Technology Stack

PythonAnthropic APINext.jsDockerGitHub Actions

Resources

Lessons Learned

  • AI-generated historical content requires explicit confidence signaling built into the data schema. Without declared uncertainty levels, outputs read as authoritative when they should not be.
  • Geographic coverage gaps are a feature, not a bug. Making them a required schema field forces honest accounting of what the documentary record does and does not capture.
  • An append-only ledger is the right pattern for long-running AI generation jobs — it makes the process resumable, auditable, and cost-predictable without complex state management.
  • 99% cost reduction through direct API access vs. third-party orchestration layers changes what is economically viable in AI research projects. Infrastructure choices have research-scope consequences.
  • Reverse chronological processing is the right order — start with the years where you can verify quality before committing to the ancient record where verification is harder.

Related Projects

View all projects