τ³-Bench benchmark tests AI agents on document navigation and voice calls

New Protocol Enables Portable Testing for AI Agents Across Frameworks

HN: Accounting & AI HN · 6d ago · 1 min read

ECP introduces vendor-neutral standard for evaluating AI agent outputs and tool calls with audit-visible context, enabling consistent testing across models and CI systems.

Meridian launches Spreadsheet Arena to benchmark LLM spreadsheet generation

Meridian Blog blogscraper · 9d ago · 1 min read

Meridian releases Spreadsheet Arena, an open benchmarking platform co-developed with Cornell, CMU, and Scale AI to evaluate LLM performance in generating spreadsheet workbooks through blind pairwis...

dbt Audit Skill Flags Errors AI Agents Miss in Data Models

HN: Accounting & AI HN · 25d ago · 1 min read

New open-source tool audits dbt projects to identify data quality issues that AI agents might overlook, improving reliability of AI-assisted accounting workflows.

OpenSOP brings auditability and control to agent workflows

HN: Accounting & AI HN · 1mo ago · 1 min read

Coba.ai releases OpenSOP, an open-source runtime for defining executable agent processes in YAML with versioned process definitions and typed APIs, addressing agent reliability and auditability con...

Lambda ERP brings chat interface to open-source accounting and inventory mana...

HQ YC: Accounting & AI Startups HN · 2mo ago · 1 min read

YC-backed developer launches Lambda ERP, an open-source ERP prototype with chat as primary interface, handling invoicing, double-entry accounting, and analytics through conversational AI.