LLM agents struggle with IPO due diligence, new benchmark reveals
Researcher evaluates Claude and ChatGPT on IPO financial analysis using S-1 filings, finding Finance Agent v2 inadequate for forward-looking IPO tasks; proposes automated rubric generation as impro...