HOMO-MEX 2023

Corpus of tweets in Mexican Spanish that contain nouns indicative of the LGBT+ community, including  slang, slurs and general terminology used to name the members of the LGBT+ collective. The dates of this tweets downloaded tweet are 01-01-2012 to 01-10-2022. To compile the corpus, data was collected from Twitter using the API with a geographical filter specific to Mexico. Additionally, a lexicon of LGBTQ+ terms was employed to select the obtained tweets. Tweets are labeled with a label indicating whether the tweet is LGBT+-phobic or not, and labels related to the type of phobia:  Lesbophobia (L), Gayphobia (G), Biphobia (B), Transphobia (T),  and/or other LGBT+phobia (O). 

Language(s)
Spanish (Mexico)
Year
2023
Domain
Social
Text types
Tweets
Annotations
a label indicating whether the tweet is LGBT+-phobic or not and labels related to the type of phobia: Lesbophobia (L), Gayphobia (G), Biphobia (B), Transphobia (T),  and/or other LGBT+phobia (O). 
Data access
Registration

Publication
Juan Vásquez, Scott Andersen, Gemma Bel-enguix, Helena Gómez-adorno, and Sergio-luis Ojeda-trueba (2023) HOMO-MEX: A Mexican Spanish Annotated Corpus for LGBT+phobia Detection on Twitter. In The 7th Workshop on Online Abuse and Harms (WOAH), pages 202–214, Toronto, Canada. Association for Computational Linguistics.
NLP Topic
Number of units
11000
Type of units
Tweets
Training set size
7000
Test set size
4000

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.