Irish universities and Institutes of Technology sit on more useful unstructured data than most banks. Module descriptors, exam papers going back decades, research outputs, QQI submissions, ethics applications, supervisor feedback, programme review documents, ERASMUS paperwork, HEA returns. The problem is that none of it talks to each other, and the people who need answers — module coordinators, registrars, research office staff, quality officers — end up emailing PDFs around or rebuilding the same spreadsheet every August. An on-premise intelligence layer changes that, but only if it's built with the constraints of the sector in mind: GDPR, research confidentiality, student data, and the hard reality that an Irish HEI is not a tech company and shouldn't be asked to behave like one.
Why generic cloud AI is a poor fit for third-level
I've spent enough time inside large Irish organisations to know what happens when you tell a DPO that student transcripts, supervisor notes, or ethics application drafts are going to a US cloud endpoint. The conversation ends. And rightly so. Section 38 of the Data Protection Act, the HEA's data governance expectations, and the practical reality of Schrems II mean that university AI in Ireland needs to default to data-stays-on-premise, not data-leaves-and-we-promise-it's-fine.
The second issue is shape of data. A university's most valuable corpus is not its website. It's the locked-down stuff: the previous five programme reviews, the external examiner reports, the research ethics committee minutes, the Bologna descriptors, the learning outcome maps. A general-purpose chatbot trained on the open web will hallucinate confidently about your specific programme and cite nothing. That's worse than useless in a quality assurance context — it's a liability.
The third issue is cost shape. Per-seat SaaS AI pricing assumes a commercial customer with predictable usage. A university has 200 academics who might use a tool heavily during programme review season and barely at all in July. Fixed on-premise infrastructure with unlimited internal use is a much better fit for how third-level actually works.
What the intelligence brain actually does in an HEI
The Intelligence Brain is an on-premise retrieval and reasoning layer. In plain terms: it ingests your documents, indexes them with embeddings, stores those embeddings in a vector database that lives on your infrastructure, and lets staff ask questions in natural language. The model that does the reasoning runs locally — typically a quantised open-weights model on a GPU box sitting in your own server room or in an Irish-hosted private tenancy.
For an Irish higher education institution, the practical applications fall into four buckets:
- Programme and module management. "Show me every module across the Faculty of Science where the learning outcomes mention sustainability, and tell me which ones map to SDG indicators." That query takes a coordinator two days manually. The brain answers it in seconds, with citations.
- Quality assurance and accreditation. Cyclical programme review, professional body accreditation (Engineers Ireland, CORU, the Teaching Council), and QQI re-engagement all require evidence retrieval across years of documents. A retrieval layer turns a four-month exercise into a four-week one.
- Research office support. Grant proposals reuse boilerplate, prior art, and CV material. A brain that has indexed every successful proposal from the last decade gives researchers a head start without exposing anything to the public internet.
- Student-facing services. Carefully scoped — never giving regulatory advice, never replacing a human — but useful for routing queries about regulations, exam appeals procedures, or where to find a specific form.
The on-premise architecture, in concrete terms
Here's roughly what an HEI deployment looks like. A single GPU server, sized to the institution — for a mid-sized IoT, a machine with one or two professional-grade GPUs and 128–256 GB of RAM is usually plenty. On top of that you run an inference server (vLLM or similar), a vector store (Qdrant, Weaviate, or pgvector if you want to keep things in Postgres), and an ingestion pipeline that handles the messy reality of academic documents: scanned PDFs from 2003, Word files with broken styles, Excel timetables, OneDrive shares, and the occasional ancient WordPerfect file someone refuses to retire.
The ingestion pipeline does OCR where needed, chunks documents semantically rather than by fixed token count, generates embeddings using a model that runs locally, and writes the vectors plus metadata into the store. Metadata matters more than people realise — faculty, programme code, document type, year, sensitivity classification, retention category. Without that metadata, retrieval becomes a lottery.
For reasoning, you want a model in the 30B–70B parameter range, quantised to 4-bit or 8-bit. That gives you reasoning quality good enough for genuine academic work without needing the kind of GPU cluster only the hyperscalers can afford. Latency on a properly sized box is two to five seconds for a typical query, which is fine for the use cases that matter.
Authentication ties into your existing identity provider — typically Microsoft Entra ID, given how much of Irish higher ed runs on Microsoft 365. Role-based access control means a finance officer doesn't see ethics committee minutes, and a postgraduate researcher doesn't see HR records. This is non-negotiable in a university context.
Handling research data, ethics, and the GDPR question properly
Research data is where most AI projects in third-level fall over. Some of it is fine to index — published papers, public datasets, completed grant reports. Some of it absolutely is not — interview transcripts under informed consent, clinical data, anything covered by a specific data sharing agreement.
The right architecture handles this with classification at ingestion time. Every document gets tagged with a sensitivity level and a permitted-use field. The brain refuses to retrieve documents whose classification is incompatible with the requesting user's role or the query context. This is dull, unglamorous engineering, but it's what makes the difference between a system the DPO will sign off on and one that lives forever in a "pilot" purgatory.
For GDPR specifically, the on-premise design solves most of the international transfer problem before it starts. There is no transfer. Subject access requests become easier, not harder, because everything is indexed and searchable. Right-to-erasure is handled by deleting from source, then re-indexing — which is straightforward if your pipeline is built for it.
Ethics committees are increasingly asking about AI use in research. A locally-hosted brain that produces auditable retrieval logs — every query, every document retrieved, every response generated — gives researchers a defensible answer. Cloud chatbots do not.
Where it pays back, and where it doesn't
I'd be honest about where third level AI in Ireland delivers and where it doesn't. It pays back hard in document-heavy, cyclical, evidence-gathering work — programme review, accreditation, internal audit, research proposal preparation, HEA returns. These are jobs where a senior academic or administrator currently spends weeks finding and collating things that already exist. A retrieval layer reclaims that time directly.
It pays back moderately in student-facing FAQ and routing — useful, not transformative, and needs careful guardrails so it never strays into giving regulatory or academic advice it shouldn't.
It pays back poorly, in my view, when institutions try to use it for assessment generation or marking. The technology is not there yet for the kind of nuanced, context-sensitive judgement that academic assessment requires, and the reputational risk of getting it wrong is asymmetric. Don't go there yet.
The other thing it does, which is harder to measure but real: it surfaces what the institution actually knows. Most universities are surprised when they first run institutional queries against their own corpus. Modules they forgot existed. Research strengths they hadn't noticed clustering. Policy contradictions between faculties. That diagnostic value alone often justifies the deployment.
IoT-specific considerations
Institutes of Technology — and now the Technological Universities — have a particular profile. Heavy applied-research output, strong industry engagement, programmes that turn over more rapidly than traditional university degrees, and a stronger focus on work-based learning and apprenticeships. The document estate reflects that: more industry partner agreements, more apprenticeship paperwork, more SOLAS and QQI engagement, more rapid programme development cycles.
An IoT AI deployment in Ireland should weight its ingestion and retrieval design toward currency rather than archive depth. The 2008 programme document matters less; the current industry partner agreement matters a lot. Ingestion frequency should be daily for active programme materials, with proper version tracking so you can answer "what did the descriptor say in September when the student enrolled" not just "what does it say today."
The TU mergers have also created a real document-reconciliation problem. Two or three legacy institutions, three sets of policies, three QA frameworks, gradually being harmonised. A retrieval layer is genuinely useful here — not as a replacement for the harmonisation work, but as a way of seeing where the contradictions are before someone trips over them.
If you want to see how the architecture is laid out for the sector specifically, I've written it up at the intelligence brain for education, and the broader platform is described at the intelligence brain overview.
Where to start this week
Pick one cyclical, document-heavy process — programme review is the obvious candidate — and ask your QA office how many person-weeks went into it last cycle. Then identify the document corpus it draws on: programme documents, external examiner reports, student feedback, module descriptors. That's your pilot scope. Don't try to ingest the whole institution on day one. Get one faculty's programme review documents into a properly secured retrieval layer