text generation

The Mu-SHROOM dataset consists of a collection of prompts, model outputs, logits, and identifiers for openly available LLMs. The dataset encompasses 10 languages with validation and test data (Modern Standard Arabic, German, English, Spanish, Finnish, French, Hindi, Italian, Swedish and Mandarin Chinese), 4 test-only (Â“surpriseÂ”) languages (Catalan, Czech, Basque and Farsi), as well as unlabeled training data for English, Spanish, French, and Chinese.

RefutES 2024: Automatic Generation of Counter Speech in Spanish

NLP topic

text generation

Dataset

RefutES

Language

Spanish

Year

2024

IberAuTexTification 2024: MGT Atribution

NLP topic

text generation

Dataset

IberAuTexTification

Language

Spanish

Year

2024

IberAuTexTification 2024: MGT Detection

NLP topic

text generation

Dataset

IberAuTexTification

Language

Spanish

Year

2024

IberAuTexTification

Read more about IberAuTexTification
Log in or register to post comments

A dataset generated for the shared task focused on detecting machine-generated text and model attribution in the six main languages of the Iberian Peninsula: Catalan, English, Spanish, Basque, Galician, and Portuguese. The dataset includes human and machine-generated texts in seven domains: Chat, How-to, News, Literary, Reviews, Tweets, and Wikipedia. The generated texts are obtained using six language models: BLOOM-1B1, BLOOM-3B, BLOOM-7B1, Babbage, Curie, and text-davinci-003.

RefutES

Read more about RefutES
Log in or register to post comments

The RefutES corpus is a dataset designed for the task of refuting hate speech messages through counter-narratives. It consists of a set of pairs of offensive messages and their respective responses, generated with the aim of being reasoned, respectful, non-offensive, and containing specific and truthful information. The corpus is presented in CSV files with the following columns: