Stanford releases open SEC filings dataset for LLM pretraining
Stanford researchers publish SEFD, an open-source dataset reconstructing SEC EDGAR filings into layout-faithful MultiMarkdown to address scarcity of high-quality long-context training data for fina...