Intelligence Brain · public-sector

The intelligence brain for Irish semi-states

← Back to Intelligence Brain

Irish semi-states sit in an awkward middle. They're not core civil service, so they don't get covered by central digital strategy in the same way. They're not private firms, so they can't move on commercial timelines. They handle data that's politically sensitive, often regulated under multiple regimes at once — GDPR, sector-specific Acts, EU directives, the AI Act now landing on top — and they're being asked, like everyone else, to "do something with AI". Most of what's on offer to them is the wrong shape: SaaS that ships data to a US cloud, consultancy decks with no running code, or pilots that never make it past a single team. This article is about what an on-premise intelligence layer actually looks like for a body like Bord na Móna, ESB, Coillte, the HSE's non-clinical estate, or any of the commercial and non-commercial semi-states — and what I'd build first if I were sitting inside one.

Why semi-states are a different problem to central government

A government department has, broadly, one accounting officer, one Oireachtas committee, and one set of corporate services. A semi-state has a board, a parent department, often a regulator, and frequently commercial counterparties who expect contractual confidentiality. The data estate reflects that. You'll find an SAP or Oracle ERP doing finance, a separate HR system, asset management software specific to the sector (GIS for land, SCADA for energy, fleet systems for transport), a document management system that's usually SharePoint or Objective, and a long tail of Access databases and Excel files that run things nobody admits run on Excel.

The intelligence problem isn't "build a chatbot". It's that institutional knowledge is spread across those systems and across the heads of people who've been there twenty-five years and are retiring. When a new analyst joins, they spend six months learning where things are. When a parliamentary question lands, three people spend a day pulling the answer together. When the regulator asks for a methodology, somebody has to reconstruct it from emails. That's the problem worth solving, and it's not solved by sending the corpus to OpenAI.

Why on-premise is not optional here

I'll be blunt about this because the cloud-first crowd won't. For a semi-state, sending operational data to a hyperscaler LLM is a governance problem before it's a technical one. You have:

  • Data that's commercially sensitive to counterparties who didn't consent to its processing by a third party
  • Personal data under GDPR where the lawful basis for transfer outside the EEA is, charitably, contested
  • Records under the National Archives Act and FOI obligations that need an auditable chain of custody
  • Sector-specific obligations — CRU, ComReg, EPA, NTA depending on the body — that increasingly include AI-specific scrutiny under the AI Act's high-risk categories

You can paper over some of this with EU-region commitments and DPAs, but you can't paper over the fact that your prompts, your retrieved context, and your model outputs are now sitting in someone else's logs. On-premise — meaning a model and inference stack running on hardware the body controls, in a datacentre or colocation it has a contract for — removes that question entirely. It also removes the per-token billing that makes serious internal use uneconomic at scale.

The hardware cost is real but it's a capex line, not a recurring opex surprise. A pair of GPU servers will run a 70B-class model with comfortable headroom for a few hundred concurrent users, and the same hardware does embeddings, reranking, and document processing without an extra invoice each time someone asks a question.

What the architecture actually looks like

The intelligence brain for the public sector is, at its core, four layers stacked sensibly. I'll describe them the way I'd describe them to a head of IT, not the way a vendor would.

Layer one: ingestion and normalisation. Connectors into the systems that already exist — SharePoint, file shares, the ERP's reporting views, the document management system, email archives where appropriate, structured data warehouses. The job here is unglamorous: pull documents, extract text properly (PDFs in semi-states are often scanned, so OCR matters), preserve metadata, and respect existing access control lists. If a document is restricted to the legal team in SharePoint, it stays restricted in the brain. This is the bit most pilots get wrong because it's tedious.

Layer two: indexing and retrieval. A vector store for semantic search, a keyword index for exact matches and acronyms (semi-states run on acronyms), and a graph layer for entity relationships — who reports to whom, which asset belongs to which division, which contract references which framework. Hybrid retrieval, with reranking. Nothing exotic, but tuned for the corpus, which is mostly long-form internal documents rather than web pages.

Layer three: inference. A locally hosted model, served behind an internal API. The model size depends on the workload — for most semi-state use cases a well-tuned 30-70B model is more than enough, and you don't need to chase the frontier. What you need is consistent latency, a known cost profile, and the ability to fine-tune or adapt without exfiltrating training data.

Layer four: the audit and policy plane. Every query, every retrieved chunk, every model output, logged and queryable. Role-based access tied to the existing identity provider (Active Directory or Entra in most cases). Policy rules that can redact, refuse, or escalate based on content type. This is the layer that lets you answer the DPC, the C&AG, or an Oireachtas committee when they ask how the system is being used.

Use cases that actually pay back

I'd avoid the "transform everything" framing. The use cases that earn their keep in a semi-state in the first year are narrow and boring, and that's the point.

Parliamentary questions and FOI responses. Every PQ and FOI request triggers a search across years of correspondence, board papers, and operational records. A retrieval system that can pull every relevant document and draft a defensible first answer, with citations to source, takes a process that consumes senior staff time and compresses it. The drafter still drafts. They just don't spend the morning hunting.

Procurement and contract intelligence. Semi-states run a lot of framework agreements and call-offs. Knowing what's been bought, from whom, on what terms, against which framework, is currently a question you ask the procurement officer who's been there longest. An indexed contract corpus with structured extraction makes that a query.

Technical and regulatory knowledge. Engineers in an energy or transport semi-state work against standards documents, internal procedures, and historical incident reports. Surfacing the right precedent quickly — "have we seen this fault pattern before, what did we do, what did the regulator say" — is genuinely valuable and genuinely safe to do internally.

Onboarding and continuity. When the person who knew how the levy was calculated retires, what they knew shouldn't retire with them. A brain that's been ingesting their documented work for years is the closest thing to institutional memory you can build.

The AI Act, the DPC, and the boring governance work

The EU AI Act treats certain public-sector uses as high-risk, with conformity assessment, logging, and human oversight obligations. The Data Protection Commission has been clear that GDPR applies to AI processing the same way it applies to anything else. The Department of Public Expenditure has issued guidance on public-sector AI use. None of this is a blocker. All of it requires the audit plane I mentioned above to actually work, not be a tickbox.

Practically, that means: a documented data protection impact assessment for each significant use case, not a generic one for "AI"; a model card or equivalent for whatever you're running, even if it's an open-weights model; logging that survives a regulator's request for the prompts and outputs of a specific user on a specific day; and a human-in-the-loop policy for any output that affects a citizen, an employee, or a counterparty. This is doable. It's not doable if your AI lives in someone else's API.

Where to start this week

If you're inside a semi-state and you've been asked to "have a view on AI", don't write a strategy document. Pick one corpus — the contracts, the board papers from the last five years, the engineering standards library, whatever's both contained and painful — and stand up a retrieval system against it on hardware you control, with one team as the pilot users. Six weeks of that will tell you more than six months of consultancy. If it's useful, you scale it; if it isn't, you've spent a small amount of capex and learned something specific. That's how the intelligence brain gets built in practice — one defensible corpus at a time, on infrastructure you own, with the audit trail in place from day one. Start narrow, prove it, then widen.

Book a 30-minute assessment

Direct with Michael. No charge. No pitch deck.

Pick a slot →