The Spanish corpus EXIST 2024 is a collection of tweets and memes labeled with information related to sexism: whether the tweet/meme is sexist, the type of intent shown by the author, and the type of sexism.
Language(s)
Spanish
Dataset description link
Year
2024
Domain
Social
Annotations
Binary label indicating whether a tweet expresses sexism, multiclass lables about the type of sexism and the intention of the author
Format
json
Annotation guide link
Data access
Register form
Data link
Publication
Plaza, L. et al. (2024).EXIST 2024: sEXism Identification in Social neTworks and Memes. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2024. Lecture Notes in Computer Science, volume 14612
Publication link
License
CC-BY-4.0
NLP Topic
Number of units
9653
Type of units
Tweets
Training set size
7194
Test set size
1969
Development set size
490