Classification of tweets of self-reported COVID-19 symptoms in Spanish. The annotated set of data for this task is a manually curated set of Spanish-native language tweets. The task is a three-way classification problem, requiring participants to distinguish personal symptom mentions (self-reports) from other mentions such as symptoms reported by others (non-personal reports) and references to external sources (literature/news mentions).
Language(s)
Spanish
Year
2022
Domain
Health
Text types
Tweets
Annotations
self-report, non-personal reports and literature/news mentions
NLP Topic
Number of units
20481
Type of units
Tweets
Training set size
10052 tweets
Test set size
6851 tweets
Development set size
3578 tweets