Mu-SHROOM-2025-es

The Mu-SHROOM dataset consists of a collection of prompts, model outputs, logits, and identifiers for openly available LLMs. The dataset encompasses 10 languages with validation and test data (Modern Standard Arabic, German, English, Spanish, Finnish, French, Hindi, Italian, Swedish and Mandarin Chinese), 4 test-only (“surprise”) languages (Catalan, Czech, Basque and Farsi), as well as unlabeled training data for English, Spanish, French, and Chinese. Supplementary metadata, including raw annotations before post-processing and the Wikipedia URLs used as references, as well as the scripts used to generate model outputs for all 14 languages and code for the annotation and submission interfaces are all publicly available.

Language(s)
Spanish
English
Arabic
Deuch
Farsi
French
Hindi
Italian
Swedish
Chinese
Year
2025
Domain
Diverse
Text types
Wikipedia
Annotations
character-level hallucination spans with hard and soft labels, and annotator IDs
Data access
Public

Publication
Raul Vazquez, Timothee Mickus, Elaine Zosa, Teemu Vahtola, Jörg Tiedemann, Aman Sinha, Vincent Segonne, Fernando Sanchez - Vega, Alessandro Raganato, Jind?ich Libovický, Jussi Karlgren, Shaoxiong Ji, Jind?ich Helcl, Liane Guillou, Ona De Gibert, Jaione Bengoetxea, Joseph Attieh, and Marianna Apidianaki. 2025. SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 2472–2497, Vienna, Austria. Association for Computational Linguistics.
License
CC-BY-4.0
Number of units
200
Test set size
150
Development set size
50

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.