Mu-SHROOM, the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes

The task focuses on detecting hallucinations and other overgeneration mistakes in the output of instruction-tuned large language models. Mu-SHROOM addresses general-purpose LLMs in 14 languages, and frames the hallucination detection problem as a span labeling task.

Publication
Raul Vazquez, Timothee Mickus, Elaine Zosa, Teemu Vahtola, Jörg Tiedemann, Aman Sinha, Vincent Segonne, Fernando Sanchez - Vega, Alessandro Raganato, Jind?ich Libovický, Jussi Karlgren, Shaoxiong Ji, Jind?ich Helcl, Liane Guillou, Ona De Gibert, Jaione Bengoetxea, Joseph Attieh, and Marianna Apidianaki. 2025. SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 2472–2497, Vienna, Austria. Association for Computational Linguistics.
Language
Spanish
English
Arabic
Deuch
Farsi
French
Hindi
Italian
Swedish
Chinese
NLP topic
Year
2025
Ranking metric
IoU

Task results

System IoU Sort ascending
ATLANTIS 0.53
NLP_CIMAT 0.52
NCL-UoR 0.51
CCNU 0.51
AILS-NTUA 0.50

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.