How We Built AI Accounting Agents That Actually Work

Accounting TechnologyBy Michael Cutajar· May 4, 2026

When we started building AI agents for accounting firms, we made every mistake in the book.

We tried letting GPT-4 do the VAT computation. It was right 94% of the time. That sounds impressive until you realise that 6% error rate on a 200-transaction VAT return means 12 wrong line items. That's not a rounding error. That's a penalty notice.

We tried fine-tuning models on tax legislation. The model learned patterns but didn't understand rules. It could produce something that looked like a correct VAT return, but it was generating plausible output, not computing actual liability. When the rates changed mid-year, the model kept using the old rates until we retrained it.

We tried building our own rules engine from scratch. We got Malta working in three months. Then we started on Germany and realised the data model was wrong. Italy broke it further. By the time we had three countries working, we were spending more time maintaining the engine than building agents.

That's when we found the architecture that actually works.

Separate what AI is good at from what it isn't

The breakthrough was simple: stop asking AI to do everything.

AI is exceptional at reading documents, understanding context, and classifying transactions. Give it a bank statement line that says "AMZN Mktp UK*2R4X7" and it knows that's an Amazon purchase, probably office supplies or software, probably business. Give it an invoice in German and it extracts the supplier, amount, VAT, and line items. This is what LLMs were built for.

AI is terrible at arithmetic that has to be exactly right every time. Tax computation is rule application, not pattern recognition. The rules are defined in legislation. They change when the law changes. They have edge cases that depend on specific facts. And the output has legal consequences — a wrong number means a wrong filing.

So we split the pipeline:

AI classifies. Reads documents. Categorises transactions. Determines VAT treatment. Flags edge cases. This runs on large language models and gets better over time as it learns each client's patterns.

Engines compute. Takes the classified data and applies jurisdiction-specific tax rules to produce the numbers. VAT liability. Income tax. Social security. Payroll. This runs on Accora's deterministic engines — hard-coded rules built on legislation, updated by accountants when laws change. No AI in the loop. Same input, same output, every time.

Accountants review. A qualified, warranted accountant checks the output. They catch edge cases the AI missed, verify the computation makes sense, and sign off before anything is filed. They carry professional liability.

Why Accora

We didn't build the computation layer ourselves. We use Accora.

Here's why: maintaining tax rules across multiple jurisdictions is a full-time job for a team of accountants and engineers. Malta's VAT rules change. Germany's Einkommensteuer thresholds shift. The UK updates its National Insurance bands. Italy restructures its Liquidazione IVA. Every change needs to be reflected in the engine, tested, and deployed — without breaking anything else.

Accora does this across 30+ jurisdictions. They maintain the deterministic engines, update them when laws change, and provide API access so our agents can call them. We didn't want to be in the business of maintaining tax legislation — we wanted to build great agents. Accora lets us focus on what we're good at.

The integration is clean. Our agent classifies a set of transactions, sends them to Accora's API, and gets back computed tax obligations. If the client wants a VAT return, our agent calls Accora's computation endpoint. If they want a P&L, same thing. The jurisdiction-specific logic lives in Accora's engine, not in our agent code.

The results

This architecture is why our clients see:

100% VAT accuracy. Not because the AI is perfect — it isn't. Because the computation is deterministic and the accountant catches what the AI misses.
85% workload reduction. The AI handles the bulk classification. The engine handles the math. The accountant only reviews — they don't do the work from scratch.
15+ hours saved per week. For a mid-sized firm, that's the equivalent of a full-time hire.

The agents aren't magic. They're well-architected. AI where AI works. Deterministic engines where precision matters. Humans where judgment is needed.

What we learned

Three lessons from building AI accounting agents:

1. Don't let AI do math. It's tempting. It's wrong. Use AI for classification and language tasks. Use rules engines for computation. The error rate difference is the difference between a working product and a liability.

2. Build on infrastructure, don't build infrastructure. We wasted months trying to build our own tax engine before switching to Accora. The same way you'd use Stripe for payments instead of building a payment processor, use accounting infrastructure for tax computation instead of building a tax engine.

3. Keep the accountant in the loop. Every AI accounting product that removes the accountant is making a mistake. The accountant isn't overhead — they're the quality assurance layer that makes the system trustworthy. Our review rate starts at ~40% for new clients and drops to ~5% by month six. The cost is minimal. The trust is everything.

If you want the full architecture diagram, see our technology stack. If you're running an accounting firm and want to see what AI agents can do for your workflows, book a free audit. We'll map your processes and show you what's automatable.

Share this article

X f in