RefutES | Portal ODESIA

The RefutES corpus is a dataset designed for the task of refuting hate speech messages through counter-narratives. It consists of a set of pairs of offensive messages and their respective responses, generated with the aim of being reasoned, respectful, non-offensive, and containing specific and truthful information. The corpus is presented in CSV files with the following columns:

id: Unique identifier for each hate speech - counter-narrative message pair.
Hate-speech: Contains the hate speech message directed at a specific group.
Reference-counternarrative: Contains the counter-narrative associated with the hate speech message, generated by GPT-4.
Target: Indicates the group affected by the hate speech message.

Language(s)

Spanish

Dataset description link

https://github.com/sinai-uja/RefutES

Year

2024

Domain

Social

Annotations

Each annotation includes: the unique identifier for each hate speech message pair, the hate speech message, the counter-narrative generated by GPT-4, and the group targeted by the message.

Format

csv

Data access

Public

Data link

https://github.com/sinai-uja/RefutES

Publication

Vallecillo-Rodríguez et al. (2024). Overview of RefutES at IberLEF 2024: Automatic Generation of Counter Speech in Spanish. Procesamiento del Lenguaje Natural, Revista, 73: 449-459.

Publication link

http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/download/6630/4022

NLP Topic

text generation

Number of units

2931

Type of units

Documents

Training set size

2496

Test set size

156

Development set size

279

Log in or register to post comments