New benchmark tests AI agents on realistic spreadsheet workflows for finance
SpreadsheetBench 2 evaluates AI agents on end-to-end spreadsheet tasks (generation, debugging, visualization) used in business financial modeling and reporting—moving beyond isolated cell operation...