The DETESTS dataset is designed for the detection of stereotypes in texts, specifically focusing on racism and prejudices in social media and news article comments. It contains texts in Spanish, including tweets related to immigration hoaxes and news comments, which have been manually annotated to identify the presence of both explicit and implicit stereotypes. The dataset is structured to address binary classification tasks, where texts must be classified as either containing stereotypes or not, as well as to detect whether these stereotypes are explicit or implicit.
Language(s)
Spanish
Dataset description link
Year
2024
Domain
Social
Annotations
Each instance is assigned two labels, depending on whether it contains stereotypes or not, and whether the stereotypes are implicit or explicit.
Format
csv
Data access
Register form
Data link
Publication
Schmeisser-Nieto et al. (2024). Overview of DETESTS-Dis at IberLEF 2024: DETEction and classification of racial STereotypes in Spanish - Learning with Disagreement. Procesamiento del Lenguaje Natural, Revista, 73: 323-333.
NLP Topic
Number of units
12111
Type of units
Tweets
Training set size
9931
Test set size
2180