The HOMOMEX dataset is designed for the detection and classification of LGBT+phobic hate speech in Spanish from Mexico. It is structured into three levels of analysis: detection of LGBT+phobia in tweets and phrases; identification of types of phobia; and detection in song lyrics containing hate speech.
Language(s)
Spanish (Mexico)
Dataset description link
Year
2024
Annotations
Each instance is assigned a label, depending on the task it is used for, indicating whether it is LGBT+phobic or not, or the type of phobia it refers to, if applicable.
Format
csv
Data access
Register form
Publication
Gómez-Adorno et al. (2024). Overview of HOMO-MEX at IberLEF 2024: Hate Speech Detection Towards the Mexican Spanish speaking LGBT+ Population. Procesamiento del Lenguaje Natural, Revista, 73: 393-405.
NLP Topic
Number of units
18200
Type of units
Tweets
Training set size
14560
Test set size
3640