The corpus is a collection of 2062 tweets in Spanish annotated with a binary label that indicates whether the tweets contains a hope message or not. It consists of LGTB-related tweets that were collected with the Twitter API (June 27, 2021 to July26, 2021) and a set of tweets collected using the UMUCorpusClassifier tool, which allows defining different search criteria such as keywords, accounts and geolocation. The corpus is an improved and extended version of the SpanishHopeEDI dataset.
Language(s)
Spanish
Year
2023
Domain
Social
Text types
Tweets
Annotations
binary label indicating whether a tweet contains hope speech or not
Data access
Registration
Publication
Salud María Jiménez-Zafra, Miguel Ángel Garcia-Cumbreras, Daniel García-Baena, José Antonio Garcia-Díaz, Bharathi Raja Chakravarthi, Rafael Valencia-García, Luis Alfonso Ureña-López (2023) Overview of HOPE at IberLEF 2023: Multilingual Hope Speech Detection. Procesamiento del Lenguaje Natural, Revista nº 71, septiembre de 2023, pp. 371-381.
NLP Topic
Number of units
2062
Type of units
Tweets
Training set size
1312
Test set size
450
Development set size
300