Source linked

Codex-Driven Loops ermöglichen Selbstverbesserung von Steuerbeamten

Durch die Verschmelzung des Fachwissens der Praktiker mit einer Codex-gesteuerten Iterationsschleife erreichte Tax AI eine Genauigkeit von 97 % und einen Durchsatzanstieg von 50 % bei 7.000 Steuererklärungen.

openaithrive holdingscodexartificial intelligenceautonomous agentsmachine learning

Data entry for medium-to-large complexity tax filings can consume eight hours per return, often involving messy data sources and manual extraction. To tackle this bottleneck, OpenAI and Thrive Holdings co-developed Tax AI, a system designed to automate the preparation of 1040 and 1041 tax returns while measurably improving its own performance through a continuous feedback loop.

From Manual Corrections to Autonomous Improvement

At launch, the system primarily handled simpler tasks like W-2s and 1099s. However, as the tax season progressed, the agent moved into complex territory involving K-1s, rental real estate schedules, and multi-file reconciliations. The real challenge was making complex production failures visible and actionable. In early iterations, corrections were manual; practitioners could fix errors, but the system failed to capture the full context of why a change was made—whether it was an extraction miss, a mapping problem, or workflow noise.

To solve this, the teams implemented a three-pillar architecture: direct practitioner feedback to steer learning, production traces to capture the full path from source material to expert correction, and a Codex-driven iteration loop. This loop allows Codex to investigate production issues, propose changes, and validate them against targeted regression evals, moving the product forward faster than a purely manual engineering cycle.

Quantifiable Gains in Accuracy and Throughput

The results of the pilot, which processed 7,000 tax returns across 30+ accounting firms in the Crete network, demonstrate the power of self-improving agents. Tax AI now drafts returns with up to 97% accuracy and has increased practitioner throughput by approximately 50%.

We can see this progress in the field completion metrics. At launch, only 25% of returns reached the 75% correct field completion threshold. Within just six weeks, 86% of returns hit that mark, with even faster growth observed at the 90% and 100% accuracy levels. By automating the most time-intensive parts of the process, the system saves practitioners about a third of their total preparation time.

This architecture proves that when production traces are structured into findings, agentic capabilities like Codex can transform real-world feedback into a scalable engine for continuous product evolution.


Source: Building self-improving tax agents with Codex
Domain: openai.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.