Most "AI for business" tools treat your organisation as a pile of documents. You search, they retrieve, you read. That works for finding things you already know exist. It falls apart the moment you ask a question that crosses files, people, dates, and obligations — which is most of the questions that actually matter inside a regulated firm. The knowledge graph at the centre of the Intelligence Brain exists because retrieval-only systems can't answer those questions. This is the part of the architecture I get asked about most, so here's how it actually works.
Why a graph, and not just a vector store
Vector search is good at one thing: finding text that is semantically similar to your query. That's useful, but it has no notion of structure. It doesn't know that a particular email was sent by a partner, in response to a client, about a matter that has a fee agreement attached, which references a regulation that changed last quarter. To a vector store, those are just chunks with cosine similarity scores.
A knowledge graph encodes the relationships explicitly. Nodes are the things your firm cares about — people, clients, matters, documents, obligations, deadlines, regulators, jurisdictions, transactions. Edges are the relationships between them — authored-by, responds-to, governs, supersedes, refers-to. When you ask a question, the system can traverse the graph to assemble context that no vector search would find, because the answer isn't in any single chunk of text. It's in the shape of the connections.
The practical upshot: when a partner asks "what's the current position on the Murphy file and who agreed to it", the graph knows that "current" means the most recent authoritative document, "position" is a specific node type, and "who agreed" requires walking from the document node to the email thread node to the sender. Pure vector retrieval will hand you five plausible-looking paragraphs and let you sort it out yourself.
The schema problem: generic vs. firm-specific
The temptation when building a knowledge graph AI system is to start with a generic ontology — Person, Document, Event — and call it done. That's fine for a demo. It collapses the moment you hit a real Irish accountancy practice or solicitors' firm, because the things they actually care about don't map cleanly onto generic types.
An Irish solicitors' firm has matters, files, undertakings, ledger entries, Law Society obligations, conflict registers, and AML reviews. An accountancy practice has engagements, returns, ROS submissions, CRO filings, audit trails, and partner sign-offs. These are not "Documents" with metadata — they are first-class entities with their own lifecycle, their own validation rules, and their own regulatory weight.
So the Intelligence Brain ships with a base schema and then specialises it per vertical during onboarding. We extract the firm's actual entity types from their existing systems — practice management, document management, email, accounting — and let those types become nodes. The generic layer underneath gives us reasoning primitives. The firm-specific layer on top gives us answers that match how the firm actually thinks.
How entities and edges are extracted
This is the part that's genuinely hard, and where most "drop your documents in and we'll figure it out" products quietly fail. Building an irish knowledge graph from a firm's actual data requires three passes:
- Structured extraction from systems with schemas already — practice management databases, accounting ledgers, CRM, time recording. This gives you the spine: clients, matters, engagements, partners, fee earners, dates. It's deterministic and you should treat it as ground truth.
- Semi-structured extraction from emails and document metadata — sender, recipient, thread, subject, attachments, signing fields, version history. Mostly mechanical, with some heuristics for thread reconstruction and reply-chain stitching.
- Unstructured extraction from document bodies and email content. This is where a language model earns its keep. The model proposes entity mentions and candidate edges; a verifier checks them against the structured spine; anything that contradicts the spine is rejected or flagged.
The third pass is constrained by the first two, and that constraint is what makes the graph trustworthy. If the practice management system says a matter belongs to one partner and the model "extracts" a different partner from a stray email, the system doesn't just average them. It logs the conflict for a human to look at. Most of the time the email reference is informal — a partner copying a colleague — and the structured record is correct. Occasionally it's the other way round and you've found a real data quality issue in the firm's records. Both outcomes are useful.
Temporal correctness and the supersession problem
A live graph is not a snapshot. Documents are revised. Advice changes. Regulations update. A solicitor's note from eighteen months ago might be technically still on file but completely superseded by what was agreed last week. If your AI cheerfully retrieves the old note and presents it as current, you have a malpractice problem, not a productivity tool.
So every node and edge in the Intelligence Brain graph is bitemporal. We track valid time — when this fact was true in the world — and system time — when we learned about it. A query at time T returns the graph as it was understood at time T, not the graph as it is now. This matters for two reasons. First, when the firm needs to reconstruct what was known when a decision was made — which auditors and regulators ask about constantly — the graph can answer truthfully. Second, the model's retrieval is constrained to the temporal slice that's relevant to the question, which dramatically reduces hallucinated "current" answers based on stale documents.
Supersession is encoded as edges: supersedes, amends, revokes. When a new engagement letter is signed, the graph doesn't delete the old one — it adds the new node, draws the supersession edge, and queries for "current engagement terms" will follow the edge. The old document is still there for audit. It's just no longer authoritative.
Querying the graph: hybrid retrieval in practice
In production, almost no useful question is answered by graph traversal alone or vector search alone. The architecture is hybrid. A typical query path looks like this:
- Parse the question to identify entity references — names, matters, dates, obligations.
- Resolve those references to graph nodes using a combination of exact match, alias tables, and embedding similarity.
- Traverse the graph from those anchor nodes to assemble a relevant subgraph — typically two to three hops, bounded by relevance scoring so you don't pull in the whole firm.
- For each document or text node in that subgraph, pull the relevant chunks via vector search, scoped to that node only.
- Hand the model a context window that contains the structured subgraph plus the scoped text, with explicit edge labels and timestamps.
That last point is what makes the difference. The model isn't just seeing text — it's seeing text annotated with "this is the current engagement letter, signed by these parties, on this date, superseding this earlier version". Practice knowledge AI works because the structure is in the prompt, not just in the retrieval. The model can reason about authority, recency, and provenance because we've told it explicitly which is which.
What this gives you that pure RAG doesn't
The intelligence brain graph gives a regulated firm three things that retrieval-only systems can't:
- Provenance you can defend. Every answer is traceable to specific nodes and edges, with timestamps. When a partner asks "where did this come from", the system shows the exact path. Auditors love this. So do insurers.
- Cross-cutting questions. "Which clients have outstanding undertakings older than six months on matters where the responsible partner has changed?" That's a graph query. It is not a thing a vector store can answer.
- Institutional memory that survives staff turnover. When a fee earner leaves, their understanding of who-knows-what is gone. The graph keeps it. New staff can ask the questions the old staff would have answered, and get answers grounded in the firm's actual history.
If you want the broader picture of how this fits with the on-premise deployment model, ingestion pipeline, and per-vertical packaging, the product overview for the Intelligence Brain covers the full architecture, and the top-level Intelligence Brain page explains the positioning and which verticals are live.
Where to start this week
If you're a partner or operations lead at an Irish firm thinking about this seriously, don't start with the AI. Start with the graph. Spend an afternoon listing the entity types your firm actually cares about — not the ones your software vendor named, the ones your people use in conversation. Sketch the edges. Ask which of those entities live in structured systems today and which exist only in emails and people's heads. That sketch is the spec. Once you have it, the question of which AI sits on top is a much smaller decision than you think — and a much better one than buying a generic tool and hoping the structure emerges by itself.