Building an Institutional Knowledge Graph with Neo4j: A Practitioner's Guide for Small Organisations

A methodology for modelling complex organisational architectures using graph databases, domain-driven file decomposition, and bidirectional pipeline design — demonstrated at a small Arctic university.

Magnús Smári Smárason·March 23, 2026

knowledge-graph neo4j institutional-architecture AI-assisted-development SHACL JSON-LD higher-education graph-database

A complete methodology for building institutional knowledge graphs using Neo4j, JSON-LD, and SHACL validation — deployed at a small university with 1,039 nodes across 16 domains.

1,039

Nodes

Domains

Relationship Types

94.2%

Staff Coverage

TL;DR / Executive Summary

Small and medium-sized organisations run on invisible architecture: governance, systems, policies, staff, and interdependencies that exist mostly in people’s heads and scattered documents. The result is fragility when staff leave, endless cross-domain questions that no one can answer quickly, and AI tools that remain blind to the institution they serve.

This guide presents a complete, reproducible methodology for solving that problem with a production institutional knowledge graph. Built and deployed at the University of Akureyri (northern Iceland, ~2,800 students, ~300 staff) by a single practitioner, the graph now contains 1,039 nodes across 16 domains, 1,648 relationships of 30 types, and is kept live and accurate through a bidirectional pipeline where Neo4j is the single source of truth.

Key innovations that make it practical for small teams:

Domain-file decomposition (16 independent JSON-LD files with cross-file referential integrity)
SHACL validation + automated quality gates that drive ownership, policy, and staffing gaps to zero
Bidirectional pipeline (scrape → merge → validate → ingest → export) with perfect round-trip fidelity to Git
Layered AI-assisted construction (Claude Code inside a structured orchestration framework) that accelerates code while keeping human domain expertise in control
Closed feedback loop via two production GraphRAG chatbots evaluated on a 410-question golden dataset

The full paper includes architecture diagrams, every script size, exact quality metrics, Cypher query library, chatbot evaluation results, and an 11-step reproducibility guide so any small organisation or higher-ed institution can copy the approach in weeks, not years.

Abstract

Small and medium-sized organisations face a structural knowledge problem: institutional knowledge about systems, governance, staff, policies, and dependencies lives in people's heads, scattered documents, and undocumented assumptions. This paper presents a complete, reproducible methodology for building an institutional knowledge graph using Neo4j, JSON-LD, and SHACL validation. The approach was developed and deployed at a small university (~2,800 students, ~300 staff) in northern Iceland, producing a graph of over 1,000 nodes across 16 domains with 1,648 relationships and 30 relationship types — maintained by a single practitioner. The pipeline was constructed using AI-assisted development (Claude Code within a structured orchestration framework), following a layered domain-by-domain strategy with validation gates between layers — blending a deterministic, repeatable tech stack with generative AI that accelerates code production without participating in runtime execution. The graph powers two production chatbots evaluated against a 410-question golden dataset with LLM-as-judge scoring. The paper describes architecture, pipeline design, data modelling, validation, a bidirectional pipeline where Neo4j serves as the single source of truth, the AI-assisted construction methodology, and a systematic evaluation framework. Every design pattern is presented with sufficient detail for practitioners at other organisations to reproduce the approach.

Keywords: knowledge graph, Neo4j, institutional architecture, JSON-LD, SHACL, organisational modelling, graph database, higher education, AI-assisted development, Claude Code

1. Introduction

1.1 The Institutional Knowledge Problem

Every organisation has an architecture — not just an IT architecture, but a living mesh of governance structures, academic programmes, policy frameworks, vendor relationships, identity systems, and human expertise. In small organisations, this architecture is often undocumented. The institution functions because a handful of people carry the picture in their heads.

This creates three problems:

Fragility. When key people leave, institutional knowledge leaves with them.
Opacity. No one can answer cross-domain questions like "which systems break if our identity provider goes down?" or "which policies apply to this faculty?" without consulting multiple people.
AI readiness. Modern AI systems — chatbots, retrieval-augmented generation, decision support — require structured context about the organisation they serve. Without it, they are general-purpose tools with no institutional awareness.

This paper presents a complete methodology for addressing these problems using a graph database approach. After extensive experimentation with Neo4j for mapping complex organisational architectures and interdependencies, this approach represents what has proven effective in this implementation: a combination of graph modelling, semantic web standards, and automated pipeline design that produces a living, queryable, validated representation of an entire institution.

Organisational relationships are inherently graph-shaped — hierarchies, dependencies, memberships, and integrations — and a property graph model captures these naturally. While many aspects can be modelled in relational systems, graph traversal makes multi-hop queries (e.g. dependency chains) significantly simpler and more expressive.

1.2 Scope and Contributions

The methodology was developed at the University of Akureyri (UNAK), a small public university in northern Iceland. The resulting knowledge graph contains:

1,039 nodes across 16 domain files (729 clean nodes after deduplication in production)
1,648 relationships spanning 30 relationship types
16 domain-specific source files in JSON-LD format
16 SHACL shape definitions for structural validation
A bidirectional pipeline where Neo4j is the single source of truth and domain files are derived exports

The contributions of this paper are:

A domain-file decomposition pattern for managing large knowledge graphs as modular, independently editable files
A bidirectional pipeline architecture (scrape → merge → validate → ingest → export) that keeps version-controlled files synchronised with a live graph database
A validation gate pattern combining referential integrity checks with SHACL shape validation
An AI-assisted construction methodology that blends deterministic pipeline infrastructure with generative AI agents for layered, iterative graph construction
A chatbot evaluation framework with golden datasets and LLM-as-judge scoring that creates a closed feedback loop between chatbot performance and graph quality
Practical guidance on ID schemes, relationship modelling, observation patterns, and provenance tracking
A quality assurance methodology that systematically closes ownership, policy, and staffing gaps

Institutional knowledge management in higher education has traditionally relied on enterprise architecture frameworks (TOGAF, ArchiMate) or purpose-built information systems. These approaches tend toward heavyweight tooling and consultant-driven processes that are poorly suited to small organisations with limited IT staffing.

Graph-based approaches to organisational modelling have gained traction with the maturation of property graph databases. Neo4j's Cypher query language and MERGE-based idempotent ingestion make it particularly suitable for iteratively building institutional graphs. The W3C's JSON-LD and SHACL standards provide a standards-compliant foundation for data modelling and validation without requiring full RDF/SPARQL infrastructure.

This work differs from existing approaches in three ways: (1) it targets small organisations where a single practitioner must build and maintain the entire graph, (2) it uses AI-assisted development to accelerate pipeline construction while relying on human domain expertise for knowledge curation, and (3) it implements a bidirectional pipeline where the graph database — not the source files — is the authoritative source of truth.

3. Architecture

3.1 Data Model

The knowledge graph uses JSON-LD as its serialisation format, leveraging schema.org vocabulary extended with institution-specific types. Every entity is a node with a stable identifier, a type, and a set of properties. Relationships are encoded as JSON-LD references with optional inline properties.

Node Model

All nodes carry the Entity label in Neo4j plus type-specific labels derived from @type in JSON-LD. Every node receives four provenance fields at ingest time:

Field	Purpose
`source_system`	Origin of the data (e.g., `seed`, `scraper`, `manual`)
`source_record_id`	Original identifier in the source system
`observed_at`	Timestamp when the data was last known to be accurate
`ingested_at`	Timestamp when the data entered Neo4j

This provenance model ensures every fact in the graph is traceable to its origin and can be assessed for freshness.

ID Scheme

Stable identifiers are the backbone of the graph. The scheme uses namespaced prefixes to avoid collisions:

Category	Pattern	Example
Entities	`unak:<slug>`	`unak:Canvas`, `unak:FacultyEducation`
Staff	`unak:staff:<slug>`	`unak:staff:firstname-lastname`
Relationships	`rel:<TYPE>:<from>:<to>:<source>`	`rel:INTEGRATES_WITH:unak:Canvas:unak:Panopto:seed`
Observations	`obs:<metric>:<entity>:<timestamp>`	`obs:uptime90d:unak:Canvas:2026-02-23T00:00:00Z`

Once assigned, an ID never changes. If a role holder changes, the node stays — only the properties update. This stability is what makes cross-file references work and what makes the pipeline safely re-runnable.

3.2 Domain-File Decomposition

The single most important architectural decision was splitting the graph into 16 domain-specific JSON-LD files rather than maintaining a monolithic data file. Each file carries its own full @context block and can be edited, validated, and reasoned about independently.

#	Domain	Nodes	Contents
01	Organisational Structure	33	Institutional hierarchy — schools, faculties, centres, offices
02	IT Stack	59	All managed systems — LMS, portals, collaboration tools, AI systems
03	Identity & Auth	5	Identity providers, authentication federation
04	External Vendors	64	Cloud providers, AI vendors, library database vendors
05	Observations	13	Uptime metrics, student enrolment data
06	Governance	26	Leadership, committees, council, student association, legal framework
07	Academic Programmes	48	Graduate and undergraduate programmes across faculties
08	Policies & Legal	26	Institutional policies, AI policy, national legislation
09	Partnerships & Networks	39	Exchange programmes, research networks, partner institutions
10	Facilities & Services	17	Campus buildings, library, student services
11	Staff Directory	~291	All staff with role, contact, and department assignment
12	Extended Governance	82	Additional committees, working groups, quality bodies
13	Library Databases	97	Electronic research databases with vendor and access classification
14	Internal Handbooks (Bridge)	5	Bridge nodes to RAG content in vector storage
15	Public FAQ (Bridge)	3	Bridge nodes to FAQ content in vector storage
16	Handbook Entities	20	Entities extracted from internal handbook content

This decomposition yields four critical benefits:

Domain expertise stays local. The person who understands IT systems edits file 02. The person who understands governance edits file 06. Neither needs to understand the other 15 files.

Merge conflicts become manageable. Two people editing different domain files never conflict. Even within the same file, JSON-LD's flat graph structure means additions rarely collide.

Validation catches cross-domain errors. When files are merged, referential integrity checking verifies that every @id reference resolves to a node that exists — across all 16 files. A reference from file 07 (Academic Programmes) to a faculty defined in file 01 (Organisational Structure) is validated at merge time.

The monolith is a derived artefact. The merged output is generated by concatenating all 16 files' @graph arrays. It is never edited directly. The source of truth is always the domain files (or, in the current architecture, the Neo4j graph from which they are exported).

Domains are both a modelling and organisational construct. Each domain maps to a source file and reflects a distinct area of institutional knowledge (e.g. IT systems, governance, academic programmes).

In the current implementation, domain classification is derived deterministically during ingestion using ID prefixes, categories, and routing rules. All entities now carry a domain property, enabling explicit cross-domain analysis (e.g. measuring relationships between governance, systems, and organisational units). This allows the graph to be modular at the file level while remaining fully connected and queryable as a unified structure.

3.3 The Bidirectional Pipeline

The pipeline has evolved from a linear flow (files → Neo4j) to a bidirectional architecture where Neo4j is the single source of truth and domain files are version-controlled exports:

                    ┌──────────────────────────┐
                    │   Neo4j (Source of Truth)  │
                    └──────┬───────────┬────────┘
                           │           │
                    Export ↓           ↑ Ingest
                           │           │
                    ┌──────┴───────────┴────────┐
                    │   16 Domain JSON-LD Files   │
                    │   (Version-Controlled)       │
                    └──────┬───────────┬────────┘
                           │           │
                    Merge  ↓           ↑ Scrape
                           │           │
                    ┌──────┴───┐  ┌────┴────────┐
                    │ Merged   │  │  Web Sources │
                    │ Artefact │  │  (unak.is)   │
                    └──────┬───┘  └─────────────┘
                           │
                    Validate ↓
                           │
                    ┌──────┴──────────────────┐
                    │ SHACL + Referential      │
                    │ Integrity Gate           │
                    └─────────────────────────┘

The export pipeline (export_neo4j.py) reads all nodes and relationships from Neo4j and routes them deterministically back into the 16 domain files using ID prefix, source_system, and category as routing keys. This means:

Round-trip fidelity: Export → ingest → re-export produces identical files
Version control on graph state: Every export creates a Git-committable snapshot of the entire graph
Offline editing: Domain files can be edited locally, validated, and re-ingested
Disaster recovery: The full graph state is always recoverable from version-controlled files

The one-command orchestrator supports all pipeline directions:

./update.sh              # merge + validate only
./update.sh --scrape     # scrape web sources + merge + validate
./update.sh --ingest     # merge + validate + ingest into production
./update.sh --export     # export Neo4j → domain files, then merge + validate
./update.sh --quality    # merge + validate + quality inspection

3.4 Thirty Relationship Types

The graph models 30 relationship types, each carrying provenance fields plus optional domain-specific properties:

Relationship	Semantics	Example
`CONTAINS`	Hierarchy	School contains Faculty
`MANAGES`	Operational responsibility	IT Department manages LMS
`HOSTED_BY`	Infrastructure hosting	LMS hosted by SaaS Vendor
`INTEGRATES_WITH`	System integration	LMS integrates with Video Platform (protocol: LTI 1.3)
`DEPENDS_ON`	System dependency	Portal depends on Identity Provider (risk_impact: 5)
`AUTHENTICATES_VIA`	Auth dependency	LMS authenticates via Azure AD (risk_impact: 4)
`WORKS_FOR`	Employment	Staff member works for Institution
`MEMBER_OF`	Membership	Staff member is member of Committee
`PARTNERS_WITH`	Partnership	Institution partners with Partner University
`REGULATED_BY`	Legal compliance	Institution regulated by Higher Education Act
`GOVERNED_BY`	Governance	Faculty governed by Dean
`SUBJECT_TO`	Policy application	Department subject to Quality Policy
`IMPLEMENTS`	Policy implementation	Office implements Data Protection Policy
`DEVELOPED_BY`	Software development	System developed by Staff Member
`OWNS`	System ownership	Department owns Internal System
`PART_OF`	Compositional membership	Sub-unit part of Larger Unit
...	...	_plus 14 additional types_

Some relationships carry domain-specific properties. INTEGRATES_WITH edges include the integration protocol (e.g., LTI 1.3). DEPENDS_ON and AUTHENTICATES_VIA edges carry a risk_impact score from 1 to 5. This enables queries like: _"Show all systems with a critical dependency (risk_impact ≥ 4) on the identity provider"_ — answered instantly by graph traversal.

All relationships carry a confidence score (default 1.0 for curated data), enabling future differentiation between verified and inferred knowledge. Systems are also annotated with a criticality field, allowing the graph to support risk-aware queries once values are enriched.

3.5 The Observation Pattern

Time-varying metrics are not stored as node properties. They are materialised as separate Observation nodes linked via OBSERVED_FOR:

(obs:Observation {metric: "uptime90d", value: 99.7, observed_at: "2026-02-23"})
  -[:OBSERVED_FOR]->
(lms:Entity {name: "Canvas LMS"})

This preserves history. Re-ingesting with a new observed_at timestamp creates a new observation node; the old one persists. Over time, this builds a time series without overwriting anything and without requiring a separate time-series database.

The same pattern works for enrolment data, budget metrics, or any institutional measure that changes over time.

4. Pipeline Implementation

4.1 Scraping (Automated Data Acquisition)

Two automated scrapers extract data from publicly available web sources:

Staff Directory Scraper (530 lines of Python) — A three-pass approach:

Pass 1: Fetches the full staff directory listing using BeautifulSoup. Extracts name, job title, email, phone, and photo URL.
Pass 2: Visits each staff member's detail page using Playwright (headless browser) to extract department assignment from a specific CSS selector. This information is not available in the directory listing and requires JavaScript rendering.
Pass 3: Scrapes faculty-specific staff pages and matches unmapped staff by email address, catching anyone not resolved in Pass 2.

The three-pass approach achieves 94.2% department mapping coverage (274 out of 291 staff members). Pass 2 takes approximately 6 minutes for ~291 pages due to rate-limiting. Flags allow skipping individual passes for faster iteration (--skip-details, --skip-faculty).

Library Database Scraper (360 lines) — Fetches the electronic research database catalogue. Classifies each database by vendor and access type (subscription, open access, national access). Generates 97 nodes in approximately 10 seconds.

IT Service Audit (435 lines) — Crawls institutional web properties and identifies third-party services from HTML source analysis (external scripts, iframes, meta tags), inline script pattern matching, and page keyword detection. This is a discovery tool — findings require manual review before adding to the graph. The most recent audit added 32 nodes (19 IT systems, 13 vendors).

4.2 Merge

The merge script (47 lines of Python) reads all 16 domain files, concatenates their @graph arrays while preserving the shared @context block, and writes a single output file. Execution takes less than a second. The merged artefact is used by downstream validation and ingestion steps.

4.3 Validation Gate

Two validation passes run in sequence before any data reaches Neo4j:

Referential integrity: Every @id reference in the merged graph is checked against the set of defined node IDs. If file 07 references a faculty defined in file 01 and that faculty has been renamed or removed, this check catches it. This is the most common error when editing the graph manually.

SHACL validation: The merged JSON-LD is converted to an RDF triple store using rdflib, then validated against 16 shape definitions written in Turtle (W3C SHACL standard). Shapes enforce structural contracts:

unak:ServiceShape a sh:NodeShape ;
    sh:targetClass schema:Service ;
    sh:property [
        sh:path schema:name ;
        sh:minCount 1 ;
    ] ;
    sh:property [
        sh:path unak:category ;
        sh:minCount 1 ;
    ] .

Every node type has a corresponding shape. If validation fails, the pipeline halts. No corrupted data reaches Neo4j.

4.4 Ingestion

A two-pass ingestion into Neo4j using the official neo4j Python driver:

Pass 1: MERGE all nodes. Each node gets the Entity label plus type-specific labels. Properties are set from the JSON-LD, plus the four provenance fields.
Pass 2: MERGE all relationships (30 types) with properties.

The MERGE keyword in Cypher is critical — it makes the entire pipeline idempotent. Running the pipeline ten times produces the same graph state. Nodes matched by @id get updated; new nodes get created; nothing gets duplicated. This property is critical in this context for a pipeline that must be safely re-runnable.

4.5 Export (Reverse Pipeline)

The export script (387 lines) reads all nodes and relationships from Neo4j and writes them back into the 16 domain JSON-LD files. Routing is deterministic:

ID prefix routing: Staff nodes (unak:staff:*) → file 11, observations (obs:*) → file 05
Source system routing: Nodes tagged with a specific source system route to the corresponding file
Category routing: Remaining nodes route by their category property to the appropriate domain file

The export round-trips cleanly: export → ingest → re-export produces identical output. This was verified as part of the pipeline validation.

5. AI-Assisted Construction Methodology

The entire knowledge graph pipeline — approximately 4,800 lines of Python, 200 lines of SHACL, 120 lines of Bash — was constructed using an AI-assisted development methodology where a practical observation is that AI-assisted development significantly accelerates pipeline construction. The approach is not "AI wrote the code"; it is a structured collaboration pattern between a human domain expert (the orchestrator) and an AI coding agent operating within a purpose-built framework.

5.1 The Toolchain

The primary development tool was Claude Code, Anthropic's CLI-based coding agent. Claude Code operates directly in the terminal — reading files, writing code, running commands, and iterating on results within the developer's actual working environment. Unlike chat-based AI assistants, it has full context of the codebase and can execute multi-step tasks autonomously.

The critical accelerator was not the AI agent alone, but wrapping it in a structured orchestration framework. In this case, the framework was PAI (Personal AI Infrastructure, an open-source project by Daniel Miessler), but the methodology is reproducible with any equivalent system that provides:

Structured task decomposition — an algorithmic cycle (observe → think → plan → build → execute → verify → learn) with explicit completion criteria that must be satisfied before a task is considered done
Custom skills — domain-specific prompt modules for research, code review, quality assurance, and specialised analysis
Sub-agent spawning — the ability to launch specialised agents for parallel work on independent subtasks
Persistent context — memory across sessions, project-specific instructions, and layered rules that encode domain knowledge and quality standards

The key insight is that an unstructured conversation with an AI produces scattered results. A structured framework that enforces observe-think-plan-build-verify cycles produces consistent, validated output. Any orchestration layer that provides these four capabilities would serve the same function.

5.2 The Layered Construction Strategy

The knowledge graph was not built all at once. It was constructed through a deliberate layered strategy, where each domain was added incrementally, validated, and stabilised before the next layer was introduced:

Layer 1: Organisational Structure (33 nodes)
    ↓ validate, ingest, verify
Layer 2: IT Stack + Identity (64 nodes)
    ↓ validate, ingest, verify
Layer 3: Governance + Policies (52 nodes)
    ↓ validate, ingest, verify cross-domain references
Layer 4: Academic Programmes + Partnerships (87 nodes)
    ↓ validate, ingest, verify
Layer 5: Facilities + Services (17 nodes)
    ↓ validate, ingest, verify
Layer 6: Staff Directory (291 nodes, scraped)
    ↓ validate, ingest, verify department mapping
Layer 7: Library Databases (97 nodes, scraped)
    ↓ validate, ingest, verify vendor relationships
Layer 8: Extended Governance + Handbook Entities (102 nodes)
    ↓ validate, ingest, verify
Layer 9: Quality gap closure (iterative rounds to 0/0/0)
    ↓ export, validate round-trip fidelity
Layer 10: Bidirectional pipeline (export capability)

Each layer followed the same cycle:

Domain analysis — The orchestrator (human) identifies what entities belong in this domain, what relationships matter, and what level of granularity serves downstream use cases
Data extraction — Either manual curation (governance, policies) or AI-assisted scraper construction (staff, library databases)
Schema evolution — New node types get SHACL shapes; new relationship types get added to the mapping
Validation gate — The full pipeline runs: merge all files, validate referential integrity across all domains, validate SHACL compliance
Cross-domain verification — References between the new layer and existing layers are checked (e.g., academic programmes referencing faculties defined in Layer 1)

This layered approach meant that at every stage, the graph was in a valid, consistent state. No layer was added that broke a previous layer. The validation gate enforced this structurally, not just as a convention.

5.3 Blending Deterministic and Generative

A critical design principle was the strict separation between deterministic infrastructure and generative AI work:

Component	Nature	Role of AI
JSON-LD schema	Deterministic	AI writes initial schema; human validates and evolves it
SHACL shapes	Deterministic	AI generates shape definitions; human reviews against domain knowledge
Pipeline scripts	Deterministic	AI writes Python; human reviews logic and edge cases
Neo4j MERGE queries	Deterministic	AI generates Cypher; idempotency is verified by re-running
Validation	Deterministic	Automated — no AI in the loop at validation time
Domain knowledge curation	Generative	AI assists with research and extraction; human makes all modelling decisions
Scraper construction	Generative → Deterministic	AI builds scraper code; output is deterministic once written
Quality gap analysis	Generative	AI identifies patterns; human decides which gaps are real

The key insight: the pipeline itself is entirely deterministic. Once written, merge.py, validate.py, ingest_neo4j.py, and export_neo4j.py produce identical results regardless of whether an AI or a human runs them. AI was used to _construct_ the deterministic infrastructure, not to _operate_ it. This means the pipeline is auditable, reproducible, and does not depend on AI availability at runtime.

5.4 What AI Was Good At

AI-assisted development excelled at:

Boilerplate generation. The initial structure of each pipeline script — argument parsing, Neo4j driver setup, JSON-LD parsing, error handling — was generated rapidly and correctly.
Pattern replication. Once the first scraper existed (staff directory), the second (library databases) and third (IT audit) followed the established pattern with minimal instruction.
Cross-domain reasoning. The AI agent could read all 16 domain files simultaneously and identify missing cross-references, inconsistent naming, or orphaned nodes faster than manual review.
SHACL shape generation. Given examples of existing shapes and a new node type, the agent produced correct SHACL definitions consistently.
Quality analysis. The quality inspection framework, with its five modules and fuzzy duplicate detection, was built in a single session. The AI was particularly effective at writing the Levenshtein-based fuzzy matching and generating structured reports.
Iterative refinement. The three-pass staff scraper evolved through multiple iterations — the AI proposed each enhancement (detail-page scraping, faculty-page matching) based on analysing the coverage gaps from previous runs.

5.5 What AI Was Not Good At

AI-assisted development struggled with or was deliberately excluded from:

Domain modelling decisions. Which entities matter? What granularity? Where to draw boundaries between domains? These required institutional knowledge that no AI possesses. The orchestrator made every modelling decision.
Relationship semantics. Whether a system "depends on" vs. "integrates with" another system — and the operational difference between those two relationships — required understanding the institution's actual infrastructure, not just its documentation.
Data correctness validation. The AI could check structural validity (SHACL, referential integrity) but could not verify that "the IT department manages Canvas" is true. That required human institutional knowledge.
Political sensitivity. Governance structures, committee memberships, and policy ownership have political dimensions. The orchestrator decided what to include, how to describe it, and what level of detail was appropriate.

5.6 Completion Criteria as Quality Metrics

The most valuable aspect of the structured orchestration approach was its concept of Ideal State Criteria (ISC) — concrete, testable conditions defined before work begins and verified with evidence when work ends. This mapped directly onto the knowledge graph's quality metrics:

ISC Criterion	Quality Metric	Final State
"Every IT system has an owning department"	Ownership gaps	0 (down from 109)
"Every organisational unit subject to a policy"	Policy gaps	0 (down from ~20)
"Every staff member mapped to a department"	Staff mapping gaps	94.2% coverage
"Zero unresolved cross-file references"	Referential integrity	PASS
"All node types have SHACL shapes"	Schema coverage	16/16 shapes

Each domain layer was treated as a task with its own ISC set: define criteria, build, validate against criteria, iterate until all pass. The orchestration framework enforced this discipline — no layer was declared complete without evidence for every criterion. Custom skills within the framework handled specific tasks: web research for discovering institutional data, browser automation for verifying scraper output against live pages, and Neo4j-specific skills for Cypher queries and graph validation.

5.7 Reproducibility of the AI-Assisted Approach

For practitioners wanting to replicate this methodology:

Any capable coding agent works. The methodology is not specific to Claude Code. Any AI coding agent that can read files, write code, and execute commands in a terminal can follow the same layered construction pattern.
Structure the agent's work. An unstructured conversation with an AI produces scattered results. Any framework that enforces observe-plan-build-verify cycles — whether PAI, a custom system, or even a disciplined checklist — produces consistent, validated output.
The human is the domain expert. The AI accelerates code production and pattern recognition. The human provides institutional knowledge, makes modelling decisions, and validates correctness against reality. This division of labour is not optional.
Deterministic output is critical in this context. The pipeline must produce identical results regardless of who or what runs it. AI assists construction; it does not participate in execution.

6. Quality Assurance

6.1 Quality Inspection Framework

A modular quality inspection system runs automated checks against the graph data:

Module	Purpose
Inefficiency	Dead-end nodes, hubs, sparse nodes, disconnected entities
Redundancy	Fuzzy duplicate detection, cross-file overlaps, redundant relationships
Friction	Policy gaps, ownership gaps, staff coverage gaps
Cross-validation	Staff delta, IT coverage, empty departments, stale sources
AI Analysis	LLM-powered analysis of findings (optional, requires API key)

6.2 Gap Closure Methodology

The quality inspection modules produce concrete gap lists that are systematically closed in iterative rounds:

Ownership gaps: Every IT system, service, and organisational unit must have at least one managing or owning entity. The IT department was established as the default owner for all IT systems without explicit ownership. Starting from 109 ownership gaps, systematic assignment reduced this to zero.

Subsequent health dashboard auditing identified 27 additional IT-audit-discovered systems without explicit ownership, which are now tracked as remediation items.

Policy gaps: Every organisational unit must be subject to at least one institutional policy. Faculties were linked to relevant legislation; offices were linked to operational policies. Starting from approximately 20 policy gaps, systematic linking reduced this to zero.

Staff mapping gaps: Every staff member should be linked to a department. The three-pass scraper resolved most cases; remaining gaps were manually mapped using faculty staff pages. Starting from 18 unmapped staff, coverage reached 94.2% (remaining cases identified but not yet fully modelled).

Duplicate resolution: Fuzzy string matching identified 43 potential duplicates. Three were confirmed true duplicates (same entity with different IDs, namespace collisions, name variants) and merged. The remaining 40 were confirmed as false positives — a common occurrence with Icelandic patronymic naming patterns where unrelated individuals share similar names.

Remaining duplicate name groups represent legitimate vendor/product overlaps and are disambiguated using display names rather than merged, preserving semantic distinctions while avoiding ambiguity in user-facing contexts.

6.3 Current Quality State

Metric	Value
Policy gaps	0
Ownership gaps	0
Staff mapping gaps	0 (94.2% coverage)
SHACL validation	PASS
Referential integrity	PASS
Confirmed duplicates	0 remaining
Dead-end nodes	105 (structurally expected — vendors, programmes with only inbound edges)

6.4 Continuous Quality Monitoring

A health dashboard now runs after each pipeline execution, auditing structural and semantic integrity across the graph. This includes checks for orphan nodes, missing provenance, domain coverage, duplicate entities, and ownership gaps.

This shifts validation from a one-time gate to continuous monitoring — the graph is treated as a living system with measurable quality metrics rather than a static artefact.

The system now distinguishes between resolved gaps and identified gaps. For example, 27 IT systems currently lack explicit ownership — these are not hidden but surfaced by the health dashboard as actionable items. A graph that exposes its own incompleteness is more valuable than one that appears artificially complete.

Health Dashboard Snapshot (March 2026)
Metric	Value
Nodes	729
Relationships	1,602
Orphan nodes	0
Missing domain	0
Missing provenance	0
Cross-domain edges	2,738
Duplicate name groups	4 (disambiguated)
Systems without owner	27 (identified for remediation)

7. Application

The knowledge graph is integrated into Borg, an institutional AI platform built for the university. Borg is a broader system encompassing AI assistants, knowledge management, and institutional tooling — a full treatment of Borg is beyond the scope of this paper, but its knowledge graph features illustrate how a well-structured graph becomes immediately useful once embedded in a platform.

7.1 Interactive Graph Visualisation

Borg exposes the knowledge graph through two interactive force-directed graph diagrams:

Public Explorer — Available to all authenticated users. Presents the graph organised by seven domain categories with colour-coded nodes, search, and click-to-explore navigation. Users can visually traverse relationships, filter by domain, and drill into entity detail pages showing all connected nodes, relationship types, and properties. This is the primary way non-technical staff interact with the institutional knowledge graph.
Admin Graph — Available to administrators. The same interactive visualisation plus full CRUD operations: create, edit, and delete entities and relationships directly from the graph interface. Changes take immediate effect in Neo4j (the single source of truth) and are exportable to version-controlled domain files.

Both diagrams render the live Neo4j graph in real time — they are not static snapshots but direct queries against the production database.

7.2 Cypher Query Library for Testing and Analysis

A library of Cypher queries was developed alongside the graph for testing, validation, and operational analysis. These queries serve as both a verification tool during development and a practical analysis resource for ongoing operations:

-- Blast radius: what breaks if Azure AD goes down?
MATCH (aad:Entity {id: "unak:AzureAD"})<-[r:AUTHENTICATES_VIA|DEPENDS_ON]-(dep:Entity)
WHERE r.risk_impact >= 4
RETURN dep.name, type(r), r.risk_impact ORDER BY r.risk_impact DESC

-- Vendor concentration: which vendors host the most systems?
MATCH (v:Entity)<-[:HOSTED_BY]-(s:Entity)
RETURN v.name, count(s) AS systems ORDER BY systems DESC

-- Orphan check: zero-relationship nodes (should be 0)
MATCH (n:Entity) WHERE NOT (n)--() RETURN count(n) AS orphans

-- Staff coverage: unmapped staff members
MATCH (p:Entity)-[:WORKS_FOR]->(u:Entity {id: "unak:UNAK"})
WHERE NOT (p)-[:MEMBER_OF]->() AND p.id STARTS WITH "unak:staff:"
RETURN p.name, p.jobTitle_is

These queries were used iteratively during construction to verify each layer. The orphan check, for example, runs after every ingestion to enforce the zero-orphan invariant. The blast radius and vendor concentration queries feed directly into IT operations discussions.

A standalone Cypher audit dump (generate_cypher.py) exports the entire graph as MERGE statements, providing a portable, human-readable representation of the full graph state that can be reviewed in code review, diffed between versions, or used to recreate the graph from scratch without the Python pipeline.

7.3 Graph-Aware Chatbot

The admin interface includes a dedicated chatbot with direct access to the knowledge graph via Neo4j tool-use (detailed in Section 8). This chatbot operates as a working tool for graph maintenance — administrators can ask natural-language questions about the graph ("which IT systems have no owner?", "show me all committees and their members") and receive answers grounded in live graph traversal, not static documentation.

7.4 Dependency and Risk Analysis

The graph enables operational questions that previously required consulting multiple people:

Blast radius analysis: "What systems are affected if the identity provider goes down?" — traverse AUTHENTICATES_VIA and DEPENDS_ON edges with risk_impact ≥ 4.
Vendor concentration: "Which vendors are we most dependent on?" — count HOSTED_BY and PROVIDES edges per vendor.
Integration mapping: "What's the integration surface of our LMS?" — traverse INTEGRATES_WITH edges with protocol properties.

One immediate outcome was identifying critical identity dependencies that were not fully documented. The graph revealed that multiple core systems depended on Azure AD with high risk impact, prompting a discussion on redundancy and incident planning.

7.5 Governance and Compliance Mapping

The graph links every organisational unit to its governing bodies, applicable policies, and relevant legislation. This supports:

Compliance auditing (which policies apply to which units)
Committee membership tracking (who sits on which governance body)
Regulatory mapping (which national laws regulate which institutional functions)

7.6 Bridge to Unstructured Content

Bridge nodes connect the structured graph to unstructured content stored in a vector database (handbook text, FAQ content). This is the foundation for GraphRAG — using graph structure to improve retrieval-augmented generation. The graph provides _what_ things are and how they relate; the vector store provides the full narrative text.

The knowledge graph does not replace retrieval-augmented generation systems. Instead, it complements them: the graph provides structured relationships and constraints, while vector search provides unstructured narrative context. Together, they enable both precise queries (e.g. dependency chains) and explanatory responses grounded in documentation.

8. Chatbot Integration and Evaluation

The knowledge graph's primary consumer-facing application is a pair of production chatbots that combine structured graph data with unstructured content retrieval. Building the chatbots surfaced a question that every knowledge graph project must eventually answer: _how do you know the graph is actually useful?_ The answer is systematic evaluation.

8.1 Two Chatbot Architectures

The institution runs two chatbots with fundamentally different retrieval strategies, both drawing on the knowledge graph:

GraphRAG Agent (UNAK-spjall) — An agentic chatbot powered by Gemini with tool-use access to Neo4j. Rather than embedding graph data into a prompt, the agent decides at inference time which graph tools to invoke:

Tool	Function
`resolveEntities`	Entity lookup by name (fuzzy matching)
`searchDocuments`	Full-text + semantic search with category filtering
`getEntityDetails`	Relationship traversal for a specific entity
`exploreRelationships`	Multi-hop graph traversal
`getOrganizationStructure`	Organisational hierarchy queries

This agentic approach means the chatbot can compose multi-step queries: resolve an entity by name, traverse its relationships, then search for related documents — all within a single conversational turn. The graph provides structured facts; the vector store (pgvector with 1,536-dimensional Gemini embeddings) provides narrative context from handbooks, FAQs, and policy documents.

RAG Agent (Gervigreindur) — An AI literacy tutor that uses semantic search over chunked handbook content in pgvector. It does not query Neo4j directly but benefits from knowledge graph data that has been exported to the vector store as structured markdown chunks. This represents a simpler but effective pattern: the graph as a content source for vector retrieval.

8.2 Golden Evaluation Dataset

To measure whether the knowledge graph actually improves chatbot responses, a systematic evaluation framework was built around a golden dataset of 410 questions:

Dimension	Distribution
Question type	45% factual, 25% procedural, 15% synthesis, 10% edge case, 5% out-of-scope
Language	50% English, 40% Icelandic, 10% bilingual
Coverage	15 user stories spanning all 16 knowledge domains

The dataset includes golden questions for both chatbots:

UNAK-spjall: 30 supplemental prompts targeting graph-dependent knowledge (governance, dependencies, cross-domain relationships)
Gervigreindur: 40 golden prompts covering AI literacy, institutional procedures, and student services

8.3 LLM-as-Judge Evaluation Framework

Responses are scored by an LLM judge (Claude) across five dimensions:

Dimension	Weight	What It Measures
Correctness	30%	Factual accuracy against known ground truth
Completeness	25%	Whether the response addresses all aspects of the question
Faithfulness	20%	Whether claims are grounded in retrieved context (no hallucination)
Language	15%	Appropriate language use (Icelandic/English matching, terminology)
Referral	10%	Appropriate escalation to human support when the question exceeds scope

Scoring thresholds:

Score	Interpretation
≥ 0.85	Target — production-ready
≥ 0.75	Pass — acceptable for deployment
< 0.72	Block — deployment gate fails

8.4 Baseline Results and Findings

Initial evaluation (March 2026) produced baseline scores:

Chatbot	Composite Score	Key Finding
UNAK-spjall	2.09/3.00 (70.4%)	Critical UTF-8 encoding bug depressed all Icelandic-language responses; graph traversal otherwise effective
Gervigreindur	2.62/3.00 (90.0%)	Strong performance; minor mismatch between pedagogy-focused responses and judge weight distribution

The evaluation immediately surfaced actionable issues:

A character encoding bug in the GraphRAG agent's response formatting was silently corrupting Icelandic text — invisible without systematic evaluation
Nine domain-specific knowledge gaps were identified where graph coverage was incomplete, feeding directly back into the quality assurance pipeline (Section 6)
The out-of-scope question category revealed that the referral dimension needed explicit training — the chatbots were attempting to answer questions they should have escalated

While LLM-as-judge provides scalable evaluation across large datasets, it is complemented by manual inspection of failure cases and targeted user feedback during development.

8.5 The Feedback Loop: Evaluation → Graph → Evaluation

The most valuable outcome of systematic evaluation was establishing a closed feedback loop between chatbot performance and graph quality:

Golden Questions → Chatbot Response → LLM Judge Score
       ↑                                      │
       │                                      ↓
  Graph enrichment ← Gap analysis ← Low-scoring domains

When the judge scores a response low on completeness for a governance question, root cause analysis traces back to either (a) missing graph nodes, (b) missing relationships, or (c) missing vector content. Each maps to a specific fix: add nodes to the domain file, add edges, or enrich the handbook content. After the fix, the golden question is re-evaluated to verify improvement.

This means the evaluation dataset is not just a quality gate — it is a requirements specification for the knowledge graph itself. Questions the chatbot cannot answer well become ISC criteria for graph enrichment.

9. Design Decisions and Rationale

9.1 JSON-LD as Source Format

JSON-LD provides linked data semantics (stable URIs, type systems, vocabulary reuse from schema.org) with the developer experience of plain JSON. Domain experts can read and edit JSON-LD in any text editor without learning RDF serialisation formats or SPARQL.

The @context block maps human-readable property names to schema.org URIs. The @graph array holds all nodes. Each node has an @id and @type. This is the entire model.

9.2 SHACL over Ad-Hoc Validation

Custom Python validators would have been faster to write initially. SHACL shapes are declarative, standards-compliant, separate from pipeline code, and extensible by anyone who can write Turtle. A new shape can be added without touching Python.

9.3 Neo4j MERGE, Not CREATE

Idempotent ingestion via MERGE means the pipeline is always safe to re-run. This eliminates an entire class of operational anxiety. Any domain file edit — from a single property change to a full domain rewrite — can be ingested without concern for duplication.

9.4 Neo4j as Single Source of Truth

The initial architecture treated domain files as the source of truth with Neo4j as a downstream consumer. This was reversed: Neo4j is now authoritative, and domain files are derived exports. This change was motivated by:

Real-time editing: Changes made directly in Neo4j (via the admin dashboard or Cypher) take immediate effect
Consistency: A single authoritative source eliminates sync ambiguity
Export fidelity: The export script produces deterministic, round-trippable domain files for version control and disaster recovery

9.5 Provenance on Everything

Every node and relationship carries source_system, source_record_id, observed_at, and ingested_at. This is critical in this context for institutional infrastructure. The question "where did this data come from and when was it last verified?" must always be answerable.

9.6 Observations as Nodes, Not Properties

Time-varying data stored as separate Observation nodes preserves history across pipeline runs. This is simpler than maintaining a separate time-series database and supports the same analytical queries.

10. Lessons for Practitioners

These lessons are distilled from building and operating the graph over approximately eight weeks:

1. Start with structure, not scale. Thirty-three organisational nodes with clean relationships are more valuable than 10,000 poorly connected records. The graph's value comes from the connections, not the node count.

2. The domain-file decomposition pattern works. Splitting a graph into domain-specific source files with cross-file referential integrity checking is the single best decision for maintainability. When something breaks, you know exactly which file to inspect.

3. Validate before you ingest. SHACL validation as a gate before Neo4j ingestion catches errors that would silently corrupt the graph. Every domain file addition goes through the same gate. No exceptions.

4. Scrape what's public, curate what's not. Staff directories and library databases are public and scrapable. Governance structures, policy details, and system dependencies require human curation. Recognise which is which and do not attempt to automate the curation.

5. Idempotency is critical in this context. If the pipeline is not safe to re-run, operators will be afraid to run it. MERGE-based ingestion means the full pipeline can execute on any change, at any time, with confidence.

6. AI accelerates the code, not the knowledge. A structured AI agent operating within an orchestration framework (in this case, Claude Code with PAI*) can produce the pipeline scripts, scrapers, validation logic, and quality inspection modules at remarkable speed. But the knowledge — which entities to model, what relationships matter, what level of granularity serves downstream use cases — requires understanding the institution. The division of labour is clear: AI writes the deterministic infrastructure, the human makes every modelling decision. Neither can do the other's job effectively.

Personal AI Infrastructure (PAI), an open-source orchestration framework

7. Small institutions have an advantage. At a university of ~300 staff, the entire institution can be modelled in approximately 1,000 nodes. Every node can be personally validated. Every relationship can be checked against lived experience. This is a feature of small scale, not a limitation.

8. Make the graph bidirectional early. A unidirectional pipeline (files → database) creates a sync problem as soon as anyone edits the database directly. Building export capability from the start keeps files and database in agreement and provides version-controlled disaster recovery for free.

9. Make gaps visible, then close them systematically. Ownership, policy, and staffing gaps are not cosmetic — they represent blind spots in the institutional model. Track them as metrics and drive them to zero through iterative rounds.

Ongoing maintenance currently requires approximately a few hours per week, primarily for reviewing scraper output, resolving validation issues, and incorporating organisational changes.

10. Neo4j is exceptionally well-suited to this problem. Organisational architectures are inherently graph-shaped. The property graph model captures hierarchies, dependencies, memberships, and integrations naturally. Cypher queries for multi-hop traversal (dependency chains, blast radius analysis) are trivial to write and fast to execute. For anyone considering a graph database for organisational mapping, Neo4j is an excellent choice.

10.1 Limitations and Trade-offs

This approach is not universally optimal.

High-churn environments require continuous maintenance; without clear ownership, the graph can degrade quickly as organisational reality changes.

Some domains — particularly narrative-heavy documentation — are better served by vector retrieval than structured modelling. The knowledge graph is most valuable where relationships and dependencies are first-class concerns.

Over-modelling is a real risk. Not every relationship needs to be explicitly represented, and excessive granularity can reduce clarity rather than improve it.

Finally, governance and organisational reality do not always map cleanly to formal structures. Ownership, responsibility, and decision-making authority may be ambiguous or politically sensitive.

The key shift is treating the knowledge graph not as a static artefact, but as a continuously audited system with explicit quality metrics and feedback loops.

11. Reproducibility Guide

For practitioners wanting to replicate this approach at their own organisation:

Step 1: Define Domains

Identify 8–20 knowledge domains that cover the institution. Common starting points:

Organisational structure (hierarchy, units, roles)
IT systems and infrastructure
Governance and committees
Academic programmes
Policies and legal framework
Staff directory
Vendor/partner relationships
Facilities and services

Step 2: Choose ID and Vocabulary

Use a namespace prefix (e.g., yourorg:) for all entity IDs
Extend schema.org vocabulary with institution-specific types
Establish an ID scheme that separates entity namespaces (staff, governance, general)

Step 3: Build the Pipeline

The minimum viable pipeline consists of four scripts:

Script	Purpose	Approximate Size
`merge.py`	Concatenate domain files	~50 lines
`validate.py`	Referential integrity + SHACL	~130 lines
`ingest.py`	MERGE into Neo4j	~300 lines
`export.py`	Export Neo4j → domain files	~400 lines

Step 4: Start Small, Validate Often

Begin with organisational structure (one file, ~30 nodes). Add domains iteratively. Run validation after every change. Do not wait until the graph is "complete" to start validating — completeness is a moving target; correctness is not.

Step 5: Add Scrapers Incrementally

Identify publicly available data sources (staff directories, catalogues, service listings). Build scrapers as needed. Each scraper writes to a single domain file.

Step 6: Establish Quality Metrics

Define what "complete" means for your graph:

Every system has an owner
Every unit has applicable policies
Every staff member is mapped to a department
Zero unresolved cross-file references

Track these metrics and drive them to zero, that is my target.

View all articles →

A shadowed hand writes with a quill of copper light; luminous pages drift up into the void

Blog

Craft & Prompts

A site update — and a scribe who writes

The navigation is back, the podcast overhauled, and the content moved home onto its own foundation. The house scribe reports — and introduces his own log at skuggi.sumarhus.com. More updates on the way.

20 days ago

Blog

On summer break — and a new site in the works

Magnús is on summer break and is building a new version of smarason.is. The site went dark for a few days during the work — it is back up now. Thanks for stopping by.

26 days ago

Blog

Craft & Prompts

From recording to release — how AI works with me on the podcast

How AI fits into making Temjum tæknina — from recording to release — with Dr. Sigrún Stefánsdóttir. The conversation is the raw material; the machine works from it, never instead of it.

2 months ago

Abstract

1. Introduction

1.1 The Institutional Knowledge Problem

1.2 Scope and Contributions

2. Related Work

3. Architecture

3.1 Data Model

Node Model

ID Scheme

3.2 Domain-File Decomposition

3.3 The Bidirectional Pipeline

3.4 Thirty Relationship Types

3.5 The Observation Pattern

4. Pipeline Implementation

4.1 Scraping (Automated Data Acquisition)

4.2 Merge

4.3 Validation Gate

4.4 Ingestion

4.5 Export (Reverse Pipeline)

5. AI-Assisted Construction Methodology

5.1 The Toolchain

5.2 The Layered Construction Strategy

5.3 Blending Deterministic and Generative

5.4 What AI Was Good At

5.5 What AI Was Not Good At

5.6 Completion Criteria as Quality Metrics

5.7 Reproducibility of the AI-Assisted Approach

6. Quality Assurance

6.1 Quality Inspection Framework

6.2 Gap Closure Methodology

6.3 Current Quality State

6.4 Continuous Quality Monitoring

7. Application

7.1 Interactive Graph Visualisation

7.2 Cypher Query Library for Testing and Analysis

7.3 Graph-Aware Chatbot

7.4 Dependency and Risk Analysis

7.5 Governance and Compliance Mapping

7.6 Bridge to Unstructured Content

8. Chatbot Integration and Evaluation

8.1 Two Chatbot Architectures

8.2 Golden Evaluation Dataset

8.3 LLM-as-Judge Evaluation Framework

8.4 Baseline Results and Findings

8.5 The Feedback Loop: Evaluation → Graph → Evaluation

9. Design Decisions and Rationale

9.1 JSON-LD as Source Format

9.2 SHACL over Ad-Hoc Validation

9.3 Neo4j MERGE, Not CREATE

9.4 Neo4j as Single Source of Truth

9.5 Provenance on Everything

9.6 Observations as Nodes, Not Properties

10. Lessons for Practitioners

10.1 Limitations and Trade-offs

11. Reproducibility Guide

Step 1: Define Domains

Step 2: Choose ID and Vocabulary

Step 3: Build the Pipeline

Step 4: Start Small, Validate Often

Step 5: Add Scrapers Incrementally

Step 6: Establish Quality Metrics

Related Articles

A site update — and a scribe who writes

On summer break — and a new site in the works

From recording to release — how AI works with me on the podcast