text generation

Mu-SHROOM-2025-es

The Mu-SHROOM dataset consists of a collection of prompts, model outputs, logits, and identifiers for openly available LLMs. The dataset encompasses 10 languages with validation and test data (Modern Standard Arabic, German, English, Spanish, Finnish, French, Hindi, Italian, Swedish and Mandarin Chinese), 4 test-only (“surprise”) languages (Catalan, Czech, Basque and Farsi), as well as unlabeled training data for English, Spanish, French, and Chinese.

IberAuTexTification

A dataset generated for the shared task focused on detecting machine-generated text and model attribution in the six main languages of the Iberian Peninsula: Catalan, English, Spanish, Basque, Galician, and Portuguese. The dataset includes human and machine-generated texts in seven domains: Chat, How-to, News, Literary, Reviews, Tweets, and Wikipedia. The generated texts are obtained using six language models: BLOOM-1B1, BLOOM-3B, BLOOM-7B1, Babbage, Curie, and text-davinci-003.

RefutES

The RefutES corpus is a dataset designed for the task of refuting hate speech messages through counter-narratives. It consists of a set of pairs of offensive messages and their respective responses, generated with the aim of being reasoned, respectful, non-offensive, and containing specific and truthful information. The corpus is presented in CSV files with the following columns: