The task focuses on detecting hallucinations and other overgeneration mistakes in the output of instruction-tuned large language models. Mu-SHROOM addresses general-purpose LLMs in 14 languages, and frames the hallucination detection problem as a span labeling task.
Publication
Raul Vazquez, Timothee Mickus, Elaine Zosa, Teemu Vahtola, Jörg Tiedemann, Aman Sinha, Vincent Segonne, Fernando Sanchez - Vega, Alessandro Raganato, Jind?ich Libovický, Jussi Karlgren, Shaoxiong Ji, Jind?ich Helcl, Liane Guillou, Ona De Gibert, Jaione Bengoetxea, Joseph Attieh, and Marianna Apidianaki. 2025. SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 2472–2497, Vienna, Austria. Association for Computational Linguistics.
Language
Spanish
English
Arabic
Deuch
Farsi
French
Hindi
Italian
Swedish
Chinese
NLP topic
Dataset
Year
2025
Publication link
Ranking metric
IoU

