Datasets
Below is information about Spanish textual data sets created with the goal of solving NLP tasks. In this case, these are collections of texts, generally enriched with annotations.
-
TA1C 2024
NewsSpanishPublished in 20253,5003500.00MBTweetsclickbait detection -
SatirA 2025
DiverseSpanishPublished in 20258,0008000.00MBsatire detection -
IC-UNED-RC-ES_v1
DiverseSpanishPublished in 20252,6302630.00MBlanguage comprehension -
Spa-DataBench
DiverseSpanishPublished in 2025300300.00MBquestion answering -
CLEAR-2025
NewsSpanishPublished in 20253,0003000.00MBtext simplification -
semeval-2025-task11-emotions-es
DiverseSpanish , EnglishPublished in 20253,875sentiment analysis -
Mu-SHROOM-2025-es
DiverseSpanish , English , Arabic , Deuch , Farsi , French , Hindi , Italian , Swedish , ChinesePublished in 2025200Wikipediatext generation, factuality -
XC-Translate-2025-en-es
DiverseSpanish , English , Arabic , Deuch , French , Italian , Korean , ChinesePublished in 20256,148Pairs of sentencemachine translation -
CheckThat!Numerical-2025-es
NewsSpanish , English , ArabicPublished in 20253,689Afirmacionesfake news detection -
MultiParaDetox-2025-es
DiverseSpanish , French , Hindi , Italian , UkrainianPublished in 20241,000text detoxification -
MultiParaDetox
DiverseSpanish , UkrainianPublished in 20241,720text detoxification -
EmoSPeech
DiverseSpanishPublished in 20243,750sentiment analysis -
FLARES
NewsSpanishPublished in 2024190information extraction -
IberAuTexTification
News, Social, othersSpanish , English , PortuguesePublished in 2024168,128text generation -
SQUAD-SQAC 2024 ES
DiverseSpanishPublished in 2024110Scientific papersquestion answering
Pagination
If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.

