New benchmark tests AI agents on auditable financial research workflows
Researchers introduce BigFinanceBench, a 928-item benchmark designed to evaluate financial-research AI agents on auditability and workflow transparency—measuring not just final answers but the deri...