τ³-Bench benchmark tests AI agents on document navigation and voice calls

New open benchmark extends agent evaluation to knowledge-intensive document retrieval and full-duplex voice, with GPT-5.2 reaching ~25% on complex policy navigation tasks.

Read original article →

Stay ahead of AI in accounting

Get the latest news on agentic AI for accounting, audit, and tax delivered to your inbox. Curated by AI, reviewed by professionals.

Subscribe to Newsletter