The HOMO-LAT25 dataset is composed of posts and comments from Reddit in Spanish from several Latin American countries, including Argentina, Chile, Colombia, and Mexico, as well as other countries for cross-evaluation. All texts contain at least one keyword related to the LGBTQ+ community and are labeled with positive, negative, or neutral polarity.
Language(s)
Spanish (Argentina)
Spanish (Bolivia)
Spanish (Chile)
Spanish (Colombia)
Spanish (Dominican Republic)
Spanish (Mexico)
Spanish (Peru)
Spanish (Uruguay)
Dataset description link
Year
2025
Domain
Social
Annotations
polarity label
Format
csv
Data access
Registration
Publication
Bel-Enguix, G. et al. 2025. Overview of HOMO-LAT at IberLEF 2025: Human-centric polarity detection in Online Messages Oriented to the Latin American-speaking LGBTQ+ populaTion. Procesamiento del Lenguaje Natural, 75, pp. 413-424.
NLP Topic
Number of units
7100
Documents
7100
Size
7100.00MB
Training set size
5700
Test set size
1400

