Study exposes blind spots in LLM benchmark contamination detection
Academic research identifies distribution shift and scale as failure modes in contamination detection methods for auditing LLM training data, raising questions about the reliability of current asse...