Mu-SHROOM, the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes

The task focuses on detecting hallucinations and other overgeneration mistakes in the output of instruction-tuned large language models. Mu-SHROOM addresses general-purpose LLMs in 14 languages, and frames the hallucination detection problem as a span labeling task.

Publication

Raul Vazquez, Timothee Mickus, Elaine Zosa, Teemu Vahtola, JÃ¶rg Tiedemann, Aman Sinha, Vincent Segonne, Fernando Sanchez - Vega, Alessandro Raganato, Jind?ich LibovickÃ½, Jussi Karlgren, Shaoxiong Ji, Jind?ich Helcl, Liane Guillou, Ona De Gibert, Jaione Bengoetxea, Joseph Attieh, and Marianna Apidianaki. 2025. SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 2472Â–2497, Vienna, Austria. Association for Computational Linguistics.

Competition

SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

Language

Spanish

English

Arabic

Deuch

Farsi

French

Hindi

Italian

Swedish