HOPE-ES 2023 | Portal ODESIA

The corpus is a collection of 2062 tweets in Spanish annotated with a binary label that indicates whether the tweets contains a hope message or not. It consists of LGTB-related tweets that were collected with the Twitter API (June 27, 2021 to July26, 2021) and a set of tweets collected using the UMUCorpusClassifier tool, which allows defining different search criteria such as keywords, accounts and geolocation. The corpus is an improved and extended version of the SpanishHopeEDI dataset.

Language(s)

Spanish

Year

2023

Domain

Social

Text types

Tweets

Annotations

binary label indicating whether a tweet contains hope speech or not

Data access

Registration

Data link

https://codalab.lisn.upsaclay.fr/competitions/10215

Publication

Salud María Jiménez-Zafra, Miguel Ángel Garcia-Cumbreras, Daniel García-Baena, José Antonio Garcia-Díaz, Bharathi Raja Chakravarthi, Rafael Valencia-García, Luis Alfonso Ureña-López (2023) Overview of HOPE at IberLEF 2023: Multilingual Hope Speech Detection. Procesamiento del Lenguaje Natural, Revista nº 71, septiembre de 2023, pp. 371-381.

Publication link

http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6567/3967

NLP Topic

hate detection

Number of units

2062

Type of units

Tweets

Training set size

1312

Test set size

450

Development set size

300

Log in or register to post comments