Skip to main contentsmarason.is
  • Home
  • About
  • Projects
  • Blog
  • Podcast
  • Resources
  • Contact
IS

Magnús Smári Smárason

ResearchWritingPodcastContactGitHubLinkedIn
Íslenska

© 2026 Magnús Smári Smárason

Veritas in praxi

  1. Home
  2. /Blog
  3. /Building an Institutional Knowledge Grap...

Building an Institutional Knowledge Graph with Neo4j: A Practitioner's Guide for Small Organisations

Magnús Smári Smárason·March 23, 2026
knowledge-graphneo4jinstitutional-architectureAI-assisted-developmentSHACLJSON-LDhigher-educationgraph-database
Share

A complete methodology for building institutional knowledge graphs using Neo4j, JSON-LD, and SHACL validation — deployed at a small university with 1,039 nodes across 16 domains.

A constellation of glowing copper nodes in the void

TL;DR / Executive Summary

Small and medium-sized organisations run on invisible architecture: governance, systems, policies, staff, and interdependencies that exist mostly in people’s heads and scattered documents. The result is fragility when staff leave, endless cross-domain questions that no one can answer quickly, and AI tools that remain blind to the institution they serve.

This guide presents a complete, reproducible methodology for solving that problem with a production institutional knowledge graph. Built and deployed at the University of Akureyri (northern Iceland, ~2,800 students, ~300 staff) by a single practitioner, the graph now contains 1,039 nodes across 16 domains, 1,648 relationships of 30 types, and is kept live and accurate through a bidirectional pipeline where Neo4j is the single source of truth.

Key innovations that make it practical for small teams:

  • Domain-file decomposition (16 independent JSON-LD files with cross-file referential integrity)
  • SHACL validation + automated quality gates that drive ownership, policy, and staffing gaps to zero
  • Bidirectional pipeline (scrape → merge → validate → ingest → export) with perfect round-trip fidelity to Git
  • Layered AI-assisted construction (Claude Code inside a structured orchestration framework) that accelerates code while keeping human domain expertise in control
  • Closed feedback loop via two production GraphRAG chatbots evaluated on a 410-question golden dataset

The full paper includes architecture diagrams, every script size, exact quality metrics, Cypher query library, chatbot evaluation results, and an 11-step reproducibility guide so any small organisation or higher-ed institution can copy the approach in weeks, not years.

Abstract

Small and medium-sized organisations face a structural knowledge problem: institutional knowledge about systems, governance, staff, policies, and dependencies lives in people's heads, scattered documents, and undocumented assumptions. This paper presents a complete, reproducible methodology for building an institutional knowledge graph using Neo4j, JSON-LD, and SHACL validation. The approach was developed and deployed at a small university (~2,800 students, ~300 staff) in northern Iceland, producing a graph of over 1,000 nodes across 16 domains with 1,648 relationships and 30 relationship types — maintained by a single practitioner. The pipeline was constructed using AI-assisted development (Claude Code within a structured orchestration framework), following a layered domain-by-domain strategy with validation gates between layers — blending a deterministic, repeatable tech stack with generative AI that accelerates code production without participating in runtime execution. The graph powers two production chatbots evaluated against a 410-question golden dataset with LLM-as-judge scoring. The paper describes architecture, pipeline design, data modelling, validation, a bidirectional pipeline where Neo4j serves as the single source of truth, the AI-assisted construction methodology, and a systematic evaluation framework. Every design pattern is presented with sufficient detail for practitioners at other organisations to reproduce the approach.

Keywords: knowledge graph, Neo4j, institutional architecture, JSON-LD, SHACL, organisational modelling, graph database, higher education, AI-assisted development, Claude Code

1. Introduction

1.1 The Institutional Knowledge Problem

Every organisation has an architecture — not just an IT architecture, but a living mesh of governance structures, academic programmes, policy frameworks, vendor relationships, identity systems, and human expertise. In small organisations, this architecture is often undocumented. The institution functions because a handful of people carry the picture in their heads.

This creates three problems:

  1. Fragility. When key people leave, institutional knowledge leaves with them.
  2. Opacity. No one can answer cross-domain questions like "which systems break if our identity provider goes down?" or "which policies apply to this faculty?" without consulting multiple people.
  3. AI readiness. Modern AI systems — chatbots, retrieval-augmented generation, decision support — require structured context about the organisation they serve. Without it, they are general-purpose tools with no institutional awareness.

This paper presents a complete methodology for addressing these problems using a graph database approach. After extensive experimentation with Neo4j for mapping complex organisational architectures and interdependencies, this approach represents what has proven effective in this implementation: a combination of graph modelling, semantic web standards, and automated pipeline design that produces a living, queryable, validated representation of an entire institution.

Organisational relationships are inherently graph-shaped — hierarchies, dependencies, memberships, and integrations — and a property graph model captures these naturally. While many aspects can be modelled in relational systems, graph traversal makes multi-hop queries (e.g. dependency chains) significantly simpler and more expressive.

1.2 Scope and Contributions

The methodology was developed at the University of Akureyri (UNAK), a small public university in northern Iceland. The resulting knowledge graph contains:

  • 1,039 nodes across 16 domain files (729 clean nodes after deduplication in production)
  • 1,648 relationships spanning 30 relationship types
  • 16 domain-specific source files in JSON-LD format
  • 16 SHACL shape definitions for structural validation
  • A bidirectional pipeline where Neo4j is the single source of truth and domain files are derived exports

The contributions of this paper are:

  1. A domain-file decomposition pattern for managing large knowledge graphs as modular, independently editable files
  2. A bidirectional pipeline architecture (scrape → merge → validate → ingest → export) that keeps version-controlled files synchronised with a live graph database
  3. A validation gate pattern combining referential integrity checks with SHACL shape validation
  4. An AI-assisted construction methodology that blends deterministic pipeline infrastructure with generative AI agents for layered, iterative graph construction
  5. A chatbot evaluation framework with golden datasets and LLM-as-judge scoring that creates a closed feedback loop between chatbot performance and graph quality
  6. Practical guidance on ID schemes, relationship modelling, observation patterns, and provenance tracking
  7. A quality assurance methodology that systematically closes ownership, policy, and staffing gaps

2. Related Work

Institutional knowledge management in higher education has traditionally relied on enterprise architecture frameworks (TOGAF, ArchiMate) or purpose-built information systems. These approaches tend toward heavyweight tooling and consultant-driven processes that are poorly suited to small organisations with limited IT staffing.

Graph-based approaches to organisational modelling have gained traction with the maturation of property graph databases. Neo4j's Cypher query language and MERGE-based idempotent ingestion make it particularly suitable for iteratively building institutional graphs. The W3C's JSON-LD and SHACL standards provide a standards-compliant foundation for data modelling and validation without requiring full RDF/SPARQL infrastructure.

This work differs from existing approaches in three ways: (1) it targets small organisations where a single practitioner must build and maintain the entire graph, (2) it uses AI-assisted development to accelerate pipeline construction while relying on human domain expertise for knowledge curation, and (3) it implements a bidirectional pipeline where the graph database — not the source files — is the authoritative source of truth.

3. Architecture

3.1 Data Model

The knowledge graph uses JSON-LD as its serialisation format, leveraging schema.org vocabulary extended with institution-specific types. Every entity is a node with a stable identifier, a type, and a set of properties. Relationships are encoded as JSON-LD references with optional inline properties.

Node Model

All nodes carry the Entity label in Neo4j plus type-specific labels derived from @type in JSON-LD. Every node receives four provenance fields at ingest time:

FieldPurpose
`source_system`Origin of the data (e.g., `seed`, `scraper`, `manual`)
`source_record_id`Original identifier in the source system
`observed_at`Timestamp when the data was last known to be accurate
`ingested_at`Timestamp when the data entered Neo4j

This provenance model ensures every fact in the graph is traceable to its origin and can be assessed for freshness.

ID Scheme

Stable identifiers are the backbone of the graph. The scheme uses namespaced prefixes to avoid collisions:

CategoryPatternExample
Entities`unak:<slug>``unak:Canvas`, `unak:FacultyEducation`
Staff`unak:staff:<slug>``unak:staff:firstname-lastname`
Relationships`rel:<TYPE>:<from>:<to>:<source>``rel:INTEGRATES_WITH:unak:Canvas:unak:Panopto:seed`
Observations`obs:<metric>:<entity>:<timestamp>``obs:uptime90d:unak:Canvas:2026-02-23T00:00:00Z`

Once assigned, an ID never changes. If a role holder changes, the node stays — only the properties update. This stability is what makes cross-file references work and what makes the pipeline safely re-runnable.

3.2 Domain-File Decomposition

The single most important architectural decision was splitting the graph into 16 domain-specific JSON-LD files rather than maintaining a monolithic data file. Each file carries its own full @context block and can be edited, validated, and reasoned about independently.

#DomainNodesContents
01Organisational Structure33Institutional hierarchy — schools, faculties, centres, offices
02IT Stack59All managed systems — LMS, portals, collaboration tools, AI systems
03Identity & Auth5Identity providers, authentication federation
04External Vendors64Cloud providers, AI vendors, library database vendors
05Observations13Uptime metrics, student enrolment data
06Governance26Leadership, committees, council, student association, legal framework
07Academic Programmes48Graduate and undergraduate programmes across faculties
08Policies & Legal26Institutional policies, AI policy, national legislation
09Partnerships & Networks39Exchange programmes, research networks, partner institutions
10Facilities & Services17Campus buildings, library, student services
11Staff Directory~291All staff with role, contact, and department assignment
12Extended Governance82Additional committees, working groups, quality bodies
13Library Databases97Electronic research databases with vendor and access classification
14Internal Handbooks (Bridge)5Bridge nodes to RAG content in vector storage
15Public FAQ (Bridge)3Bridge nodes to FAQ content in vector storage
16Handbook Entities20Entities extracted from internal handbook content

This decomposition yields four critical benefits:

Domain expertise stays local. The person who understands IT systems edits file 02. The person who understands governance edits file 06. Neither needs to understand the other 15 files.

Merge conflicts become manageable. Two people editing different domain files never conflict. Even within the same file, JSON-LD's flat graph structure means additions rarely collide.

Validation catches cross-domain errors. When files are merged, referential integrity checking verifies that every @id reference resolves to a node that exists — across all 16 files. A reference from file 07 (Academic Programmes) to a faculty defined in file 01 (Organisational Structure) is validated at merge time.

The monolith is a derived artefact. The merged output is generated by concatenating all 16 files' @graph arrays. It is never edited directly. The source of truth is always the domain files (or, in the current architecture, the Neo4j graph from which they are exported).

Domains are both a modelling and organisational construct. Each domain maps to a source file and reflects a distinct area of institutional knowledge (e.g. IT systems, governance, academic programmes).

In the current implementation, domain classification is derived deterministically during ingestion using ID prefixes, categories, and routing rules. All entities now carry a domain property, enabling explicit cross-domain analysis (e.g. measuring relationships between governance, systems, and organisational units). This allows the graph to be modular at the file level while remaining fully connected and queryable as a unified structure.

3.3 The Bidirectional Pipeline

The pipeline has evolved from a linear flow (files → Neo4j) to a bidirectional architecture where Neo4j is the single source of truth and domain files are version-controlled exports:

                    ┌──────────────────────────┐
                    │   Neo4j (Source of Truth)  │
                    └──────┬───────────┬────────┘
                           │           │
                    Export ↓           ↑ Ingest
                           │           │
                    ┌──────┴───────────┴────────┐
                    │   16 Domain JSON-LD Files   │
                    │   (Version-Controlled)       │
                    └──────┬───────────┬────────┘
                           │           │
                    Merge  ↓           ↑ Scrape
                           │           │
                    ┌──────┴───┐  ┌────┴────────┐
                    │ Merged   │  │  Web Sources │
                    │ Artefact │  │  (unak.is)   │
                    └──────┬───┘  └─────────────┘
                           │
                    Validate ↓
                           │
                    ┌──────┴──────────────────┐
                    │ SHACL + Referential      │
                    │ Integrity Gate           │
                    └─────────────────────────┘

The export pipeline (export_neo4j.py) reads all nodes and relationships from Neo4j and routes them deterministically back into the 16 domain files using ID prefix, source_system, and category as routing keys. This means:

  • Round-trip fidelity: Export → ingest → re-export produces identical files
  • Version control on graph state: Every export creates a Git-committable snapshot of the entire graph
  • Offline editing: Domain files can be edited locally, validated, and re-ingested
  • Disaster recovery: The full graph state is always recoverable from version-controlled files

The one-command orchestrator supports all pipeline directions:

./update.sh              # merge + validate only
./update.sh --scrape     # scrape web sources + merge + validate
./update.sh --ingest     # merge + validate + ingest into production
./update.sh --export     # export Neo4j → domain files, then merge + validate
./update.sh --quality    # merge + validate + quality inspection

3.4 Thirty Relationship Types

The graph models 30 relationship types, each carrying provenance fields plus optional domain-specific properties:

RelationshipSemanticsExample
`CONTAINS`HierarchySchool contains Faculty
`MANAGES`Operational responsibilityIT Department manages LMS
`HOSTED_BY`Infrastructure hostingLMS hosted by SaaS Vendor
`INTEGRATES_WITH`System integrationLMS integrates with Video Platform (protocol: LTI 1.3)
`DEPENDS_ON`System dependencyPortal depends on Identity Provider (risk_impact: 5)
`AUTHENTICATES_VIA`Auth dependencyLMS authenticates via Azure AD (risk_impact: 4)
`WORKS_FOR`EmploymentStaff member works for Institution
`MEMBER_OF`MembershipStaff member is member of Committee
`PARTNERS_WITH`PartnershipInstitution partners with Partner University
`REGULATED_BY`Legal complianceInstitution regulated by Higher Education Act
`GOVERNED_BY`GovernanceFaculty governed by Dean
`SUBJECT_TO`Policy applicationDepartment subject to Quality Policy
`IMPLEMENTS`Policy implementationOffice implements Data Protection Policy
`DEVELOPED_BY`Software developmentSystem developed by Staff Member
`OWNS`System ownershipDepartment owns Internal System
`PART_OF`Compositional membershipSub-unit part of Larger Unit
......_plus 14 additional types_

Some relationships carry domain-specific properties. INTEGRATES_WITH edges include the integration protocol (e.g., LTI 1.3). DEPENDS_ON and AUTHENTICATES_VIA edges carry a risk_impact score from 1 to 5. This enables queries like: _"Show all systems with a critical dependency (risk_impact ≥ 4) on the identity provider"_ — answered instantly by graph traversal.

All relationships carry a confidence score (default 1.0 for curated data), enabling future differentiation between verified and inferred knowledge. Systems are also annotated with a criticality field, allowing the graph to support risk-aware queries once values are enriched.

3.5 The Observation Pattern

Time-varying metrics are not stored as node properties. They are materialised as separate Observation nodes linked via OBSERVED_FOR:

(obs:Observation {metric: "uptime90d", value: 99.7, observed_at: "2026-02-23"})
  -[:OBSERVED_FOR]->
(lms:Entity {name: "Canvas LMS"})

This preserves history. Re-ingesting with a new observed_at timestamp creates a new observation node; the old one persists. Over time, this builds a time series without overwriting anything and without requiring a separate time-series database.

The same pattern works for enrolment data, budget metrics, or any institutional measure that changes over time.

4. Pipeline Implementation

4.1 Scraping (Automated Data Acquisition)

Two automated scrapers extract data from publicly available web sources:

Staff Directory Scraper (530 lines of Python) — A three-pass approach:

  • Pass 1: Fetches the full staff directory listing using BeautifulSoup. Extracts name, job title, email, phone, and photo URL.
  • Pass 2: Visits each staff member's detail page using Playwright (headless browser) to extract department assignment from a specific CSS selector. This information is not available in the directory listing and requires JavaScript rendering.
  • Pass 3: Scrapes faculty-specific staff pages and matches unmapped staff by email address, catching anyone not resolved in Pass 2.

The three-pass approach achieves 94.2% department mapping coverage (274 out of 291 staff members). Pass 2 takes approximately 6 minutes for ~291 pages due to rate-limiting. Flags allow skipping individual passes for faster iteration (--skip-details, --skip-faculty).

Library Database Scraper (360 lines) — Fetches the electronic research database catalogue. Classifies each database by vendor and access type (subscription, open access, national access). Generates 97 nodes in approximately 10 seconds.

IT Service Audit (435 lines) — Crawls institutional web properties and identifies third-party services from HTML source analysis (external scripts, iframes, meta tags), inline script pattern matching, and page keyword detection. This is a discovery tool — findings require manual review before adding to the graph. The most recent audit added 32 nodes (19 IT systems, 13 vendors).

4.2 Merge

The merge script (47 lines of Python) reads all 16 domain files, concatenates their @graph arrays while preserving the shared @context block, and writes a single output file. Execution takes less than a second. The merged artefact is used by downstream validation and ingestion steps.

4.3 Validation Gate

Two validation passes run in sequence before any data reaches Neo4j:

Referential integrity: Every @id reference in the merged graph is checked against the set of defined node IDs. If file 07 references a faculty defined in file 01 and that faculty has been renamed or removed, this check catches it. This is the most common error when editing the graph manually.

SHACL validation: The merged JSON-LD is converted to an RDF triple store using rdflib, then validated against 16 shape definitions written in Turtle (W3C SHACL standard). Shapes enforce structural contracts:

unak:ServiceShape a sh:NodeShape ;
    sh:targetClass schema:Service ;
    sh:property [
        sh:path schema:name ;
        sh:minCount 1 ;
    ] ;
    sh:property [
        sh:path unak:category ;
        sh:minCount 1 ;
    ] .

Every node type has a corresponding shape. If validation fails, the pipeline halts. No corrupted data reaches Neo4j.

4.4 Ingestion

A two-pass ingestion into Neo4j using the official neo4j Python driver:

  • Pass 1: MERGE all nodes. Each node gets the Entity label plus type-specific labels. Properties are set from the JSON-LD, plus the four provenance fields.
  • Pass 2: MERGE all relationships (30 types) with properties.

The MERGE keyword in Cypher is critical — it makes the entire pipeline idempotent. Running the pipeline ten times produces the same graph state. Nodes matched by @id get updated; new nodes get created; nothing gets duplicated. This property is critical in this context for a pipeline that must be safely re-runnable.

4.5 Export (Reverse Pipeline)

The export script (387 lines) reads all nodes and relationships from Neo4j and writes them back into the 16 domain JSON-LD files. Routing is deterministic:

  1. ID prefix routing: Staff nodes (unak:staff:*) → file 11, observations (obs:*) → file 05
  2. Source system routing: Nodes tagged with a specific source system route to the corresponding file
  3. Category routing: Remaining nodes route by their category property to the appropriate domain file

The export round-trips cleanly: export → ingest → re-export produces identical output. This was verified as part of the pipeline validation.

5. AI-Assisted Construction Methodology

The entire knowledge graph pipeline — approximately 4,800 lines of Python, 200 lines of SHACL, 120 lines of Bash — was constructed using an AI-assisted development methodology where a practical observation is that AI-assisted development significantly accelerates pipeline construction. The approach is not "AI wrote the code"; it is a structured collaboration pattern between a human domain expert (the orchestrator) and an AI coding agent operating within a purpose-built framework.

5.1 The Toolchain

The primary development tool was Claude Code, Anthropic's CLI-based coding agent. Claude Code operates directly in the terminal — reading files, writing code, running commands, and iterating on results within the developer's actual working environment. Unlike chat-based AI assistants, it has full context of the codebase and can execute multi-step tasks autonomously.

The critical accelerator was not the AI agent alone, but wrapping it in a structured orchestration framework. In this case, the framework was PAI (Personal AI Infrastructure, an open-source project by Daniel Miessler), but the methodology is reproducible with any equivalent system that provides:

  • Structured task decomposition — an algorithmic cycle (observe → think → plan → build → execute → verify → learn) with explicit completion criteria that must be satisfied before a task is considered done
  • Custom skills — domain-specific prompt modules for research, code review, quality assurance, and specialised analysis
  • Sub-agent spawning — the ability to launch specialised agents for parallel work on independent subtasks
  • Persistent context — memory across sessions, project-specific instructions, and layered rules that encode domain knowledge and quality standards

The key insight is that an unstructured conversation with an AI produces scattered results. A structured framework that enforces observe-think-plan-build-verify cycles produces consistent, validated output. Any orchestration layer that provides these four capabilities would serve the same function.

5.2 The Layered Construction Strategy

The knowledge graph was not built all at once. It was constructed through a deliberate layered strategy, where each domain was added incrementally, validated, and stabilised before the next layer was introduced:

Layer 1: Organisational Structure (33 nodes)
    ↓ validate, ingest, verify
Layer 2: IT Stack + Identity (64 nodes)
    ↓ validate, ingest, verify
Layer 3: Governance + Policies (52 nodes)
    ↓ validate, ingest, verify cross-domain references
Layer 4: Academic Programmes + Partnerships (87 nodes)
    ↓ validate, ingest, verify
Layer 5: Facilities + Services (17 nodes)
    ↓ validate, ingest, verify
Layer 6: Staff Directory (291 nodes, scraped)
    ↓ validate, ingest, verify department mapping
Layer 7: Library Databases (97 nodes, scraped)
    ↓ validate, ingest, verify vendor relationships
Layer 8: Extended Governance + Handbook Entities (102 nodes)
    ↓ validate, ingest, verify
Layer 9: Quality gap closure (iterative rounds to 0/0/0)
    ↓ export, validate round-trip fidelity
Layer 10: Bidirectional pipeline (export capability)

Each layer followed the same cycle:

  1. Domain analysis — The orchestrator (human) identifies what entities belong in this domain, what relationships matter, and what level of granularity serves downstream use cases
  2. Data extraction — Either manual curation (governance, policies) or AI-assisted scraper construction (staff, library databases)
  3. Schema evolution — New node types get SHACL shapes; new relationship types get added to the mapping
  4. Validation gate — The full pipeline runs: merge all files, validate referential integrity across all domains, validate SHACL compliance
  5. Cross-domain verification — References between the new layer and existing layers are checked (e.g., academic programmes referencing faculties defined in Layer 1)

This layered approach meant that at every stage, the graph was in a valid, consistent state. No layer was added that broke a previous layer. The validation gate enforced this structurally, not just as a convention.

5.3 Blending Deterministic and Generative

A critical design principle was the strict separation between deterministic infrastructure and generative AI work:

ComponentNatureRole of AI
JSON-LD schemaDeterministicAI writes initial schema; human validates and evolves it
SHACL shapesDeterministicAI generates shape definitions; human reviews against domain knowledge
Pipeline scriptsDeterministicAI writes Python; human reviews logic and edge cases
Neo4j MERGE queriesDeterministicAI generates Cypher; idempotency is verified by re-running
ValidationDeterministicAutomated — no AI in the loop at validation time
Domain knowledge curationGenerativeAI assists with research and extraction; human makes all modelling decisions
Scraper constructionGenerative → DeterministicAI builds scraper code; output is deterministic once written
Quality gap analysisGenerativeAI identifies patterns; human decides which gaps are real

The key insight: the pipeline itself is entirely deterministic. Once written, merge.py, validate.py, ingest_neo4j.py, and export_neo4j.py produce identical results regardless of whether an AI or a human runs them. AI was used to _construct_ the deterministic infrastructure, not to _operate_ it. This means the pipeline is auditable, reproducible, and does not depend on AI availability at runtime.

5.4 What AI Was Good At

AI-assisted development excelled at:

  • Boilerplate generation. The initial structure of each pipeline script — argument parsing, Neo4j driver setup, JSON-LD parsing, error handling — was generated rapidly and correctly.
  • Pattern replication. Once the first scraper existed (staff directory), the second (library databases) and third (IT audit) followed the established pattern with minimal instruction.
  • Cross-domain reasoning. The AI agent could read all 16 domain files simultaneously and identify missing cross-references, inconsistent naming, or orphaned nodes faster than manual review.
  • SHACL shape generation. Given examples of existing shapes and a new node type, the agent produced correct SHACL definitions consistently.
  • Quality analysis. The quality inspection framework, with its five modules and fuzzy duplicate detection, was built in a single session. The AI was particularly effective at writing the Levenshtein-based fuzzy matching and generating structured reports.
  • Iterative refinement. The three-pass staff scraper evolved through multiple iterations — the AI proposed each enhancement (detail-page scraping, faculty-page matching) based on analysing the coverage gaps from previous runs.

5.5 What AI Was Not Good At

AI-assisted development struggled with or was deliberately excluded from:

  • Domain modelling decisions. Which entities matter? What granularity? Where to draw boundaries between domains? These required institutional knowledge that no AI possesses. The orchestrator made every modelling decision.
  • Relationship semantics. Whether a system "depends on" vs. "integrates with" another system — and the operational difference between those two relationships — required understanding the institution's actual infrastructure, not just its documentation.
  • Data correctness validation. The AI could check structural validity (SHACL, referential integrity) but could not verify that "the IT department manages Canvas" is true. That required human institutional knowledge.
  • Political sensitivity. Governance structures, committee memberships, and policy ownership have political dimensions. The orchestrator decided what to include, how to describe it, and what level of detail was appropriate.

5.6 Completion Criteria as Quality Metrics

The most valuable aspect of the structured orchestration approach was its concept of Ideal State Criteria (ISC) — concrete, testable conditions defined before work begins and verified with evidence when work ends. This mapped directly onto the knowledge graph's quality metrics:

ISC CriterionQuality MetricFinal State
"Every IT system has an owning department"Ownership gaps0 (down from 109)
"Every organisational unit subject to a policy"Policy gaps0 (down from ~20)
"Every staff member mapped to a department"Staff mapping gaps94.2% coverage
"Zero unresolved cross-file references"Referential integrityPASS
"All node types have SHACL shapes"Schema coverage16/16 shapes

Each domain layer was treated as a task with its own ISC set: define criteria, build, validate against criteria, iterate until all pass. The orchestration framework enforced this discipline — no layer was declared complete without evidence for every criterion. Custom skills within the framework handled specific tasks: web research for discovering institutional data, browser automation for verifying scraper output against live pages, and Neo4j-specific skills for Cypher queries and graph validation.

5.7 Reproducibility of the AI-Assisted Approach

For practitioners wanting to replicate this methodology:

  1. Any capable coding agent works. The methodology is not specific to Claude Code. Any AI coding agent that can read files, write code, and execute commands in a terminal can follow the same layered construction pattern.
  2. Structure the agent's work. An unstructured conversation with an AI produces scattered results. Any framework that enforces observe-plan-build-verify cycles — whether PAI, a custom system, or even a disciplined checklist — produces consistent, validated output.
  3. The human is the domain expert. The AI accelerates code production and pattern recognition. The human provides institutional knowledge, makes modelling decisions, and validates correctness against reality. This division of labour is not optional.
  4. Deterministic output is critical in this context. The pipeline must produce identical results regardless of who or what runs it. AI assists construction; it does not participate in execution.

6. Quality Assurance

6.1 Quality Inspection Framework

A modular quality inspection system runs automated checks against the graph data:

ModulePurpose
**Inefficiency**Dead-end nodes, hubs, sparse nodes, disconnected entities
**Redundancy**Fuzzy duplicate detection, cross-file overlaps, redundant relationships
**Friction**Policy gaps, ownership gaps, staff coverage gaps
**Cross-validation**Staff delta, IT coverage, empty departments, stale sources
**AI Analysis**LLM-powered analysis of findings (optional, requires API key)

6.2 Gap Closure Methodology

The quality inspection modules produce concrete gap lists that are systematically closed in iterative rounds:

Ownership gaps: Every IT system, service, and organisational unit must have at least one managing or owning entity. The IT department was established as the default owner for all IT systems without explicit ownership. Starting from 109 ownership gaps, systematic assignment reduced this to zero.

Subsequent health dashboard auditing identified 27 additional IT-audit-discovered systems without explicit ownership, which are now tracked as remediation items.

Policy gaps: Every organisational unit must be subject to at least one institutional policy. Faculties were linked to relevant legislation; offices were linked to operational policies. Starting from approximately 20 policy gaps, systematic linking reduced this to zero.

Staff mapping gaps: Every staff member should be linked to a department. The three-pass scraper resolved most cases; remaining gaps were manually mapped using faculty staff pages. Starting from 18 unmapped staff, coverage reached 94.2% (remaining cases identified but not yet fully modelled).

Duplicate resolution: Fuzzy string matching identified 43 potential duplicates. Three were confirmed true duplicates (same entity with different IDs, namespace collisions, name variants) and merged. The remaining 40 were confirmed as false positives — a common occurrence with Icelandic patronymic naming patterns where unrelated individuals share similar names.

Remaining duplicate name groups represent legitimate vendor/product overlaps and are disambiguated using display names rather than merged, preserving semantic distinctions while avoiding ambiguity in user-facing contexts.

6.3 Current Quality State

MetricValue
Policy gaps0
Ownership gaps0
Staff mapping gaps0 (94.2% coverage)
SHACL validationPASS
Referential integrityPASS
Confirmed duplicates0 remaining
Dead-end nodes105 (structurally expected — vendors, programmes with only inbound edges)

6.4 Continuous Quality Monitoring

A health dashboard now runs after each pipeline execution, auditing structural and semantic integrity across the graph. This includes checks for orphan nodes, missing provenance, domain coverage, duplicate entities, and ownership gaps.

This shifts validation from a one-time gate to continuous monitoring — the graph is treated as a living system with measurable quality metrics rather than a static artefact.

The system now distinguishes between resolved gaps and identified gaps. For example, 27 IT systems currently lack explicit ownership — these are not hidden but surfaced by the health dashboard as actionable items. A graph that exposes its own incompleteness is more valuable than one that appears artificially complete.

Health Dashboard Snapshot (March 2026)
MetricValue
Nodes729
Relationships1,602
Orphan nodes0
Missing domain0
Missing provenance0
Cross-domain edges2,738
Duplicate name groups4 (disambiguated)
Systems without owner27 (identified for remediation)

7. Application

The knowledge graph is integrated into Borg, an institutional AI platform built for the university. Borg is a broader system encompassing AI assistants, knowledge management, and institutional tooling — a full treatment of Borg is beyond the scope of this paper, but its knowledge graph features illustrate how a well-structured graph becomes immediately useful once embedded in a platform.

7.1 Interactive Graph Visualisation

Borg exposes the knowledge graph through two interactive force-directed graph diagrams:

  • Public Explorer — Available to all authenticated users. Presents the graph organised by seven domain categories with colour-coded nodes, search, and click-to-explore navigation. Users can visually traverse relationships, filter by domain, and drill into entity detail pages showing all connected nodes, relationship types, and properties. This is the primary way non-technical staff interact with the institutional knowledge graph.
  • Admin Graph — Available to administrators. The same interactive visualisation plus full CRUD operations: create, edit, and delete entities and relationships directly from the graph interface. Changes take immediate effect in Neo4j (the single source of truth) and are exportable to version-controlled domain files.

Both diagrams render the live Neo4j graph in real time — they are not static snapshots but direct queries against the production database.

7.2 Cypher Query Library for Testing and Analysis

A library of Cypher queries was developed alongside the graph for testing, validation, and operational analysis. These queries serve as both a verification tool during development and a practical analysis resource for ongoing operations:

-- Blast radius: what breaks if Azure AD goes down?
MATCH (aad:Entity {id: "unak:AzureAD"})<-[r:AUTHENTICATES_VIA|DEPENDS_ON]-(dep:Entity)
WHERE r.risk_impact >= 4
RETURN dep.name, type(r), r.risk_impact ORDER BY r.risk_impact DESC

-- Vendor concentration: which vendors host the most systems?
MATCH (v:Entity)<-[:HOSTED_BY]-(s:Entity)
RETURN v.name, count(s) AS systems ORDER BY systems DESC

-- Orphan check: zero-relationship nodes (should be 0)
MATCH (n:Entity) WHERE NOT (n)--() RETURN count(n) AS orphans

-- Staff coverage: unmapped staff members
MATCH (p:Entity)-[:WORKS_FOR]->(u:Entity {id: "unak:UNAK"})
WHERE NOT (p)-[:MEMBER_OF]->() AND p.id STARTS WITH "unak:staff:"
RETURN p.name, p.jobTitle_is

These queries were used iteratively during construction to verify each layer. The orphan check, for example, runs after every ingestion to enforce the zero-orphan invariant. The blast radius and vendor concentration queries feed directly into IT operations discussions.

A standalone Cypher audit dump (generate_cypher.py) exports the entire graph as MERGE statements, providing a portable, human-readable representation of the full graph state that can be reviewed in code review, diffed between versions, or used to recreate the graph from scratch without the Python pipeline.

7.3 Graph-Aware Chatbot

The admin interface includes a dedicated chatbot with direct access to the knowledge graph via Neo4j tool-use (detailed in Section 8). This chatbot operates as a working tool for graph maintenance — administrators can ask natural-language questions about the graph ("which IT systems have no owner?", "show me all committees and their members") and receive answers grounded in live graph traversal, not static documentation.

7.4 Dependency and Risk Analysis

The graph enables operational questions that previously required consulting multiple people:

  • Blast radius analysis: "What systems are affected if the identity provider goes down?" — traverse AUTHENTICATES_VIA and DEPENDS_ON edges with risk_impact ≥ 4.
  • Vendor concentration: "Which vendors are we most dependent on?" — count HOSTED_BY and PROVIDES edges per vendor.
  • Integration mapping: "What's the integration surface of our LMS?" — traverse INTEGRATES_WITH edges with protocol properties.

One immediate outcome was identifying critical identity dependencies that were not fully documented. The graph revealed that multiple core systems depended on Azure AD with high risk impact, prompting a discussion on redundancy and incident planning.

7.5 Governance and Compliance Mapping

The graph links every organisational unit to its governing bodies, applicable policies, and relevant legislation. This supports:

  • Compliance auditing (which policies apply to which units)
  • Committee membership tracking (who sits on which governance body)
  • Regulatory mapping (which national laws regulate which institutional functions)

7.6 Bridge to Unstructured Content

Bridge nodes connect the structured graph to unstructured content stored in a vector database (handbook text, FAQ content). This is the foundation for GraphRAG — using graph structure to improve retrieval-augmented generation. The graph provides _what_ things are and how they relate; the vector store provides the full narrative text.

The knowledge graph does not replace retrieval-augmented generation systems. Instead, it complements them: the graph provides structured relationships and constraints, while vector search provides unstructured narrative context. Together, they enable both precise queries (e.g. dependency chains) and explanatory responses grounded in documentation.

8. Chatbot Integration and Evaluation

The knowledge graph's primary consumer-facing application is a pair of production chatbots that combine structured graph data with unstructured content retrieval. Building the chatbots surfaced a question that every knowledge graph project must eventually answer: _how do you know the graph is actually useful?_ The answer is systematic evaluation.

8.1 Two Chatbot Architectures

The institution runs two chatbots with fundamentally different retrieval strategies, both drawing on the knowledge graph:

GraphRAG Agent (UNAK-spjall) — An agentic chatbot powered by Gemini with tool-use access to Neo4j. Rather than embedding graph data into a prompt, the agent decides at inference time which graph tools to invoke:

ToolFunction
`resolveEntities`Entity lookup by name (fuzzy matching)
`searchDocuments`Full-text + semantic search with category filtering
`getEntityDetails`Relationship traversal for a specific entity
`exploreRelationships`Multi-hop graph traversal
`getOrganizationStructure`Organisational hierarchy queries

This agentic approach means the chatbot can compose multi-step queries: resolve an entity by name, traverse its relationships, then search for related documents — all within a single conversational turn. The graph provides structured facts; the vector store (pgvector with 1,536-dimensional Gemini embeddings) provides narrative context from handbooks, FAQs, and policy documents.

RAG Agent (Gervigreindur) — An AI literacy tutor that uses semantic search over chunked handbook content in pgvector. It does not query Neo4j directly but benefits from knowledge graph data that has been exported to the vector store as structured markdown chunks. This represents a simpler but effective pattern: the graph as a content source for vector retrieval.

8.2 Golden Evaluation Dataset

To measure whether the knowledge graph actually improves chatbot responses, a systematic evaluation framework was built around a golden dataset of 410 questions:

DimensionDistribution
**Question type**45% factual, 25% procedural, 15% synthesis, 10% edge case, 5% out-of-scope
**Language**50% English, 40% Icelandic, 10% bilingual
**Coverage**15 user stories spanning all 16 knowledge domains

The dataset includes golden questions for both chatbots:

  • UNAK-spjall: 30 supplemental prompts targeting graph-dependent knowledge (governance, dependencies, cross-domain relationships)
  • Gervigreindur: 40 golden prompts covering AI literacy, institutional procedures, and student services

8.3 LLM-as-Judge Evaluation Framework

Responses are scored by an LLM judge (Claude) across five dimensions:

DimensionWeightWhat It Measures
Correctness30%Factual accuracy against known ground truth
Completeness25%Whether the response addresses all aspects of the question
Faithfulness20%Whether claims are grounded in retrieved context (no hallucination)
Language15%Appropriate language use (Icelandic/English matching, terminology)
Referral10%Appropriate escalation to human support when the question exceeds scope

Scoring thresholds:

ScoreInterpretation
≥ 0.85Target — production-ready
≥ 0.75Pass — acceptable for deployment
< 0.72Block — deployment gate fails

8.4 Baseline Results and Findings

Initial evaluation (March 2026) produced baseline scores:

ChatbotComposite ScoreKey Finding
UNAK-spjall2.09/3.00 (70.4%)Critical UTF-8 encoding bug depressed all Icelandic-language responses; graph traversal otherwise effective
Gervigreindur2.62/3.00 (90.0%)Strong performance; minor mismatch between pedagogy-focused responses and judge weight distribution

The evaluation immediately surfaced actionable issues:

  • A character encoding bug in the GraphRAG agent's response formatting was silently corrupting Icelandic text — invisible without systematic evaluation
  • Nine domain-specific knowledge gaps were identified where graph coverage was incomplete, feeding directly back into the quality assurance pipeline (Section 6)
  • The out-of-scope question category revealed that the referral dimension needed explicit training — the chatbots were attempting to answer questions they should have escalated

While LLM-as-judge provides scalable evaluation across large datasets, it is complemented by manual inspection of failure cases and targeted user feedback during development.

8.5 The Feedback Loop: Evaluation → Graph → Evaluation

The most valuable outcome of systematic evaluation was establishing a closed feedback loop between chatbot performance and graph quality:

Golden Questions → Chatbot Response → LLM Judge Score
       ↑                                      │
       │                                      ↓
  Graph enrichment ← Gap analysis ← Low-scoring domains

When the judge scores a response low on completeness for a governance question, root cause analysis traces back to either (a) missing graph nodes, (b) missing relationships, or (c) missing vector content. Each maps to a specific fix: add nodes to the domain file, add edges, or enrich the handbook content. After the fix, the golden question is re-evaluated to verify improvement.

This means the evaluation dataset is not just a quality gate — it is a requirements specification for the knowledge graph itself. Questions the chatbot cannot answer well become ISC criteria for graph enrichment.

9. Design Decisions and Rationale

9.1 JSON-LD as Source Format

JSON-LD provides linked data semantics (stable URIs, type systems, vocabulary reuse from schema.org) with the developer experience of plain JSON. Domain experts can read and edit JSON-LD in any text editor without learning RDF serialisation formats or SPARQL.

The @context block maps human-readable property names to schema.org URIs. The @graph array holds all nodes. Each node has an @id and @type. This is the entire model.

9.2 SHACL over Ad-Hoc Validation

Custom Python validators would have been faster to write initially. SHACL shapes are declarative, standards-compliant, separate from pipeline code, and extensible by anyone who can write Turtle. A new shape can be added without touching Python.

9.3 Neo4j MERGE, Not CREATE

Idempotent ingestion via MERGE means the pipeline is always safe to re-run. This eliminates an entire class of operational anxiety. Any domain file edit — from a single property change to a full domain rewrite — can be ingested without concern for duplication.

9.4 Neo4j as Single Source of Truth

The initial architecture treated domain files as the source of truth with Neo4j as a downstream consumer. This was reversed: Neo4j is now authoritative, and domain files are derived exports. This change was motivated by:

  • Real-time editing: Changes made directly in Neo4j (via the admin dashboard or Cypher) take immediate effect
  • Consistency: A single authoritative source eliminates sync ambiguity
  • Export fidelity: The export script produces deterministic, round-trippable domain files for version control and disaster recovery

9.5 Provenance on Everything

Every node and relationship carries source_system, source_record_id, observed_at, and ingested_at. This is critical in this context for institutional infrastructure. The question "where did this data come from and when was it last verified?" must always be answerable.

9.6 Observations as Nodes, Not Properties

Time-varying data stored as separate Observation nodes preserves history across pipeline runs. This is simpler than maintaining a separate time-series database and supports the same analytical queries.

10. Lessons for Practitioners

These lessons are distilled from building and operating the graph over approximately eight weeks:

1. Start with structure, not scale. Thirty-three organisational nodes with clean relationships are more valuable than 10,000 poorly connected records. The graph's value comes from the connections, not the node count.

2. The domain-file decomposition pattern works. Splitting a graph into domain-specific source files with cross-file referential integrity checking is the single best decision for maintainability. When something breaks, you know exactly which file to inspect.

3. Validate before you ingest. SHACL validation as a gate before Neo4j ingestion catches errors that would silently corrupt the graph. Every domain file addition goes through the same gate. No exceptions.

4. Scrape what's public, curate what's not. Staff directories and library databases are public and scrapable. Governance structures, policy details, and system dependencies require human curation. Recognise which is which and do not attempt to automate the curation.

5. Idempotency is critical in this context. If the pipeline is not safe to re-run, operators will be afraid to run it. MERGE-based ingestion means the full pipeline can execute on any change, at any time, with confidence.

6. AI accelerates the code, not the knowledge. A structured AI agent operating within an orchestration framework (in this case, Claude Code with PAI*) can produce the pipeline scripts, scrapers, validation logic, and quality inspection modules at remarkable speed. But the knowledge — which entities to model, what relationships matter, what level of granularity serves downstream use cases — requires understanding the institution. The division of labour is clear: AI writes the deterministic infrastructure, the human makes every modelling decision. Neither can do the other's job effectively.

  • Personal AI Infrastructure (PAI), an open-source orchestration framework

7. Small institutions have an advantage. At a university of ~300 staff, the entire institution can be modelled in approximately 1,000 nodes. Every node can be personally validated. Every relationship can be checked against lived experience. This is a feature of small scale, not a limitation.

8. Make the graph bidirectional early. A unidirectional pipeline (files → database) creates a sync problem as soon as anyone edits the database directly. Building export capability from the start keeps files and database in agreement and provides version-controlled disaster recovery for free.

9. Make gaps visible, then close them systematically. Ownership, policy, and staffing gaps are not cosmetic — they represent blind spots in the institutional model. Track them as metrics and drive them to zero through iterative rounds.

Ongoing maintenance currently requires approximately a few hours per week, primarily for reviewing scraper output, resolving validation issues, and incorporating organisational changes.

10. Neo4j is exceptionally well-suited to this problem. Organisational architectures are inherently graph-shaped. The property graph model captures hierarchies, dependencies, memberships, and integrations naturally. Cypher queries for multi-hop traversal (dependency chains, blast radius analysis) are trivial to write and fast to execute. For anyone considering a graph database for organisational mapping, Neo4j is an excellent choice.

10.1 Limitations and Trade-offs

This approach is not universally optimal.

High-churn environments require continuous maintenance; without clear ownership, the graph can degrade quickly as organisational reality changes.

Some domains — particularly narrative-heavy documentation — are better served by vector retrieval than structured modelling. The knowledge graph is most valuable where relationships and dependencies are first-class concerns.

Over-modelling is a real risk. Not every relationship needs to be explicitly represented, and excessive granularity can reduce clarity rather than improve it.

Finally, governance and organisational reality do not always map cleanly to formal structures. Ownership, responsibility, and decision-making authority may be ambiguous or politically sensitive.

The key shift is treating the knowledge graph not as a static artefact, but as a continuously audited system with explicit quality metrics and feedback loops.

11. Reproducibility Guide

For practitioners wanting to replicate this approach at their own organisation:

Step 1: Define Domains

Identify 8–20 knowledge domains that cover the institution. Common starting points:

  • Organisational structure (hierarchy, units, roles)
  • IT systems and infrastructure
  • Governance and committees
  • Academic programmes
  • Policies and legal framework
  • Staff directory
  • Vendor/partner relationships
  • Facilities and services

Step 2: Choose ID and Vocabulary

  • Use a namespace prefix (e.g., yourorg:) for all entity IDs
  • Extend schema.org vocabulary with institution-specific types
  • Establish an ID scheme that separates entity namespaces (staff, governance, general)

Step 3: Build the Pipeline

The minimum viable pipeline consists of four scripts:

ScriptPurposeApproximate Size
`merge.py`Concatenate domain files~50 lines
`validate.py`Referential integrity + SHACL~130 lines
`ingest.py`MERGE into Neo4j~300 lines
`export.py`Export Neo4j → domain files~400 lines

Step 4: Start Small, Validate Often

Begin with organisational structure (one file, ~30 nodes). Add domains iteratively. Run validation after every change. Do not wait until the graph is "complete" to start validating — completeness is a moving target; correctness is not.

Step 5: Add Scrapers Incrementally

Identify publicly available data sources (staff directories, catalogues, service listings). Build scrapers as needed. Each scraper writes to a single domain file.

Step 6: Establish Quality Metrics

Define what "complete" means for your graph:

  • Every system has an owner
  • Every unit has applicable policies
  • Every staff member is mapped to a department
  • Zero unresolved cross-file references

Track these metrics and drive them to zero, that is my target.

PreviousI Ran My Own DevOps Prompts Against My Portfolio Site
NextThe Glove Is a Database

Related Articles

View all articles →
S2E09 cover
Blog

It's not a box, it's a bucket

A conversation with Pietro Segreto about publishing, knowledge and AI — gatekeepers, the statistical shadow, the glass box of 1423, and the bucket each generation fills.

10 days ago
S2E08 cover
Blog

AI won't learn the golf swing for you

Guðmundur Smári Gunnarsson and I start with golf and end in a larger conversation about the body, rhythm and AI as a training partner. Skill is built in the body.

21 days ago
S2E07 cover
Blog

From recording to release — how AI works with me on the podcast

How AI fits into making Temjum tæknina — from recording to release — with Dr. Sigrún Stefánsdóttir. The conversation is the raw material; the machine works from it, never instead of it.

28 days ago