Corpus of tweets in Mexican Spanish that contain nouns indicative of the LGBT+ community, including slang, slurs and general terminology used to name the members of the LGBT+ collective. The dates of this tweets downloaded tweet are 01-01-2012 to 01-10-2022. To compile the corpus, data was collected from Twitter using the API with a geographical filter specific to Mexico. Additionally, a lexicon of LGBTQ+ terms was employed to select the obtained tweets. Tweets are labeled with a label indicating whether the tweet is LGBT+-phobic or not, and labels related to the type of phobia: Lesbophobia (L), Gayphobia (G), Biphobia (B), Transphobia (T), and/or other LGBT+phobia (O).
Language(s)
Spanish (Mexico)
Dataset description link
Year
2023
Domain
Social
Text types
Tweets
Annotations
a label indicating whether the tweet is LGBT+-phobic or not and labels related to the type of phobia: Lesbophobia (L), Gayphobia (G), Biphobia (B), Transphobia (T), and/or other LGBT+phobia (O).
Annotation guide link
Data access
Registration
Publication
Juan Vásquez, Scott Andersen, Gemma Bel-enguix, Helena Gómez-adorno, and Sergio-luis Ojeda-trueba (2023) HOMO-MEX: A Mexican Spanish Annotated Corpus for LGBT+phobia Detection on Twitter. In The 7th Workshop on Online Abuse and Harms (WOAH), pages 202–214, Toronto, Canada. Association for Computational Linguistics.
Publication link
NLP Topic
Number of units
11000
Type of units
Tweets
Training set size
7000
Test set size
4000