The RefutES corpus is a dataset designed for the task of refuting hate speech messages through counter-narratives. It consists of a set of pairs of offensive messages and their respective responses, generated with the aim of being reasoned, respectful, non-offensive, and containing specific and truthful information. The corpus is presented in CSV files with the following columns:
- id: Unique identifier for each hate speech - counter-narrative message pair.
- Hate-speech: Contains the hate speech message directed at a specific group.
- Reference-counternarrative: Contains the counter-narrative associated with the hate speech message, generated by GPT-4.
- Target: Indicates the group affected by the hate speech message.
Language(s)
Spanish
Dataset description link
Year
2024
Domain
Social
Annotations
Each annotation includes: the unique identifier for each hate speech message pair, the hate speech message, the counter-narrative generated by GPT-4, and the group targeted by the message.
Format
csv
Data access
Public
Data link
Publication
Vallecillo-RodrÃguez et al. (2024). Overview of RefutES at IberLEF 2024: Automatic Generation of Counter Speech in Spanish. Procesamiento del Lenguaje Natural, Revista, 73: 449-459.
NLP Topic
Number of units
2931
Type of units
Documents
Training set size
2496
Test set size
156
Development set size
279