Tax research is the part of an accountant's day that quietly eats hours. A client emails about a share buyback, a CGT clearance question, or whether a particular reorganisation qualifies for Section 615 relief, and the partner ends up flicking between the Revenue website, the TCA 1997, a Tax and Duty Manual that was updated last quarter, and an old file note from 2019 that may or may not still apply. The work is real and it's billable, but most of the elapsed time is navigation, not thinking. An Irish-Revenue-aware brain — a private retrieval system that knows the legislation, the manuals, the eBriefs, and your own firm's prior advice — collapses that navigation cost. This article walks through how I'd build one, what the failure modes are, and where the engineering actually matters.

What "Revenue-aware" actually means

A general-purpose LLM will happily answer Irish tax questions. It will also confidently cite a section number that doesn't exist, quote a manual paragraph that was rewritten two updates ago, or apply UK rules to an Irish scenario because the training data leans that way. None of that is acceptable in a practice that signs returns.

Revenue-aware means three things, in order of difficulty. First, the system retrieves from a curated, versioned corpus — TCA 1997, VATCA 2010, CATCA 2003, SDCA 1999, the Tax and Duty Manuals, eBriefs, Revenue's published precedents, and the Statutory Instruments — rather than relying on parametric memory. Second, every answer cites a specific paragraph in a specific document at a specific version, so the user can verify in one click. Third, the system knows what it doesn't know: when a query touches an area that the corpus doesn't cover (a recent Finance Act change not yet absorbed into the manuals, say), it says so rather than improvising.

That third property is the one that separates a usable tool from a liability. A confident wrong answer to "is this distribution within the scope of Section 130?" is worse than no answer at all.

Building the corpus and keeping it current

The corpus is the product. The model is interchangeable; the corpus is what makes the brain Irish-Revenue-aware rather than vaguely tax-aware.

I build it in three layers. The legislative layer is the consolidated TCA and the other principal Acts, ingested as structured XML where available and parsed into section-subsection-paragraph units. Each unit carries a version stamp tied to the most recent Finance Act amendment. When a Finance Act passes, a diff job flags every affected section and re-indexes only those, rather than rebuilding from scratch.

The administrative layer is the Tax and Duty Manuals. Revenue updates these continuously, and the update notes at the top of each manual matter — they tell you what changed and when. I scrape the manuals on a schedule, hash each page, and only re-embed pages whose hash has changed. The eBriefs sit alongside, indexed by date and topic.

The third layer is the firm's own institutional knowledge: prior advice notes, file memos, technical briefings, internal training material. This is the layer that turns a generic Irish tax assistant into your firm's brain. It's also the layer with the most sensitive data, which is why on-premise or tenant-isolated deployment isn't optional for a serious accounting practice. I've written more about that architecture under the accounting brain.

Chunking, embedding, and the retrieval problem

Naive RAG fails on tax content for a specific reason: tax law is hierarchically referential. Section 615 means nothing without Section 616. A manual paragraph saying "see paragraph 4.2 above" is useless if the chunk only contains paragraph 4.2 itself. Standard fixed-window chunking destroys this.

The chunking strategy I use is structural, not character-count-based. Each chunk is a logical unit — a subsection, a manual paragraph, an example — and carries metadata pointing to its parent (the section it belongs to), its siblings (cross-references), and its ancestors (the chapter and Part). When the retriever pulls a chunk, it also pulls the immediate parent and any explicitly cross-referenced sections. The cost is a roughly three-to-five times larger context payload per query; the benefit is that the model actually has the surrounding law it needs to reason.

For embeddings, a general multilingual model is adequate but not great. The vocabulary of Irish tax — "specified relevant person", "qualifying company", "associated company" — is precise and consistent, and a model fine-tuned on tax corpora retrieves noticeably better. Hybrid retrieval (dense embeddings plus BM25 over the legislative text) is the right default. Pure semantic search will miss exact section number queries; pure keyword will miss conceptual ones.

The query layer: classifying before retrieving

Not every tax question is the same shape. "What rate of CGT applies to a disposal of development land?" is a lookup. "Does this share-for-share exchange qualify for Section 586 treatment?" is a multi-step analysis with conditional retrieval. "Draft a Revenue submission for a Section 811C disclosure" is a generation task with a specific structure.

I classify the incoming query before touching the retriever. A small classifier — it can be a fine-tuned small model or even a well-prompted call to the main model — routes the query into one of a handful of buckets: legislative lookup, manual interpretation, computational (rate, threshold, deadline), procedural (how do I file X), or advisory analysis. Each bucket has its own retrieval strategy and prompt scaffold.

For an advisory analysis, the system retrieves in stages: first the relevant section, then the manual guidance on that section, then any eBriefs, then the firm's prior advice on similar fact patterns. Each stage's results inform the next query. This is slower — typically several seconds rather than under one — but the quality difference on substantive questions is large enough that I default to it for anything classified as advisory.

Citations, confidence, and the audit trail

Every output the brain produces has to be traceable to a primary source. The implementation: each retrieved chunk carries a stable identifier (document, version, paragraph), and the generation prompt instructs the model to cite the identifier inline against every factual claim. A post-processor then verifies that every cited identifier actually appears in the retrieved context — if the model invents a citation, the post-processor flags it and the response is regenerated or rejected.

Confidence is harder. I don't trust model-reported confidence; it correlates poorly with actual correctness. What works better is retrieval-based confidence: how many high-similarity chunks supported the answer, whether they agreed with each other, and whether the answer required the model to fill gaps the retriever didn't cover. A response built from three tightly-clustered, high-similarity legislative chunks is more trustworthy than one stitched together from six weakly-related fragments. I expose this to the user as a simple signal — green, amber, or a refusal to answer — rather than a percentage.

The audit trail is the part that matters for regulated practice. Every query, every retrieved chunk with its version, every generated response, every user action on that response (accepted, edited, rejected) is logged. When a partner reviews a junior's work six months later, or when the firm faces a professional indemnity question, the trail is there. This is the kind of thing the broader Intelligence Brain architecture handles by default rather than as a bolt-on.

Where it fails, and what to do about it

Three failure modes show up reliably. The first is currency lag — a Finance Act change hits the legislation before the manuals are updated, and the brain gives an answer based on superseded guidance. The mitigation is to flag every legislative section that has been amended more recently than its corresponding manual, so the user sees a warning. It doesn't solve the problem, but it stops silent failure.

The second is fact-pattern matching. Tax advice depends on facts the user often hasn't supplied. A good brain asks back: "Is the company resident? Is the disposal at arm's length? Is there a connected-party relationship?" rather than answering on assumptions. Building this in means resisting the temptation to optimise for fast responses; for advisory queries, a clarifying question is the correct first output.

The third is anti-avoidance. Section 811C and the general anti-avoidance rule don't appear in the section-by-section logic of a transaction. A brain that answers "yes, this qualifies for Section 615 relief" without flagging that GAAR could still apply is producing technically correct but professionally inadequate advice. I hard-code an anti-avoidance check into any advisory response involving reorganisations, group structures, or transactions where the tax outcome is the apparent point of the structure.

Where to start this week

If you're a partner thinking about this, don't start with the model. Start with the corpus. Pull together your firm's last two years of advisory memos, file notes, and technical briefings, and look at what they cite. That tells you which sections of the TCA and which manuals your practice actually leans on — usually a much smaller set than you'd guess. Build the first version of the brain around that core, get partners using it on real queries for a fortnight, and only then expand the corpus. The fastest path to a useful system is a narrow one that works, not a broad one that nearly works.

Book a 30-minute assessment

Direct with Michael. No charge. No pitch deck.

Pick a slot →

Automating tax research with an Irish-Revenue-aware brain